Redundant url searches - Githubissues

samwize / python-email-crawler

Search on Google, and crawls for emails related to the result

292 stars 127 forks source link

Redundant url searches #10

Closed mpbunch closed 5 years ago

mpbunch commented 8 years ago

When scanning a large site: If the same url is on multiple pages: The same url will be scanned for each page.

If there is a way to remember that the url has already been scanned so it can skip the url when it is found an 1+n time, this would save a lot of time.