Closed rivermont closed 6 years ago
Currently, a request is sent for a site's robots.txt every time a link is crawled. It would be much faster if results of a robots.txt query were saved in some database. Only one request should need to be sent.
robots.txt
perhaps the result could be stored in a dictionary with urls as keys? if it is only supposed to be stored once per run
Currently, a request is sent for a site's
robots.txt
every time a link is crawled. It would be much faster if results of arobots.txt
query were saved in some database. Only one request should need to be sent.