khuongduyit / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Exchangable robots.txt stores #75

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When running several crawler4j processes in concert, it's may be a good idea to 
store all fetched robots.txt information into a central database. This prevents 
multiple fetching of the same file. In the latest version of crawler4j this is 
not possible. 

I added a patch that provides a "HostDirectivesStore" interface with a 
LocalMapStore implementation. The LocalMapStore has the same semantics as the 
current implementation. By implementing an other host directives store, one can 
e.g. store all robots.txt information in a database.

Original issue reported on code.google.com by alexande...@unister-gmbh.de on 18 Aug 2011 at 12:37

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks for providing the patches. I'll review them and will include them in the 
next version.

-Yasser

Original comment by ganjisaffar@gmail.com on 19 Aug 2011 at 3:11

GoogleCodeExporter commented 9 years ago

Original comment by avrah...@gmail.com on 18 Aug 2014 at 3:10