rivermont / spidy

The simple, easy to use command line web crawler.
GNU General Public License v3.0
340 stars 69 forks source link

Robots file reader repaired #63

Closed stevelle closed 7 years ago

stevelle commented 7 years ago

The robots file reader now remembers a robots.txt file for each distinct hostname found during navigation and reuses it across all threads in the current run.

Instead of handing around a function handle a thread- safe object is used to keep track of the this, making the code a little easier to read.

Fixes #62

Checklist

stevelle commented 7 years ago

lines 20 and 21 might not be the same as when the template was created.