Open GoogleCodeExporter opened 8 years ago
1) Create a config value MaxPagesToCrawlPerDomain? in the
CrawlConfiguration?.cs file and have .net fill it with the config section (like
the other properties in that class) 2) Extend CrawlDecisionMaker?.cs 3) Add a
ConcurrentDictionary?<string, int> that keeps track of the domains that have
been crawled and the current count for each domain 4) Override ShouldCrawlPage?
method and have it addto/check the dictionary to be sure a domain is not
crawled more than x times. 3) Pass in your implementation
WebCrawler crawler = new WebCrawler(
null,
null,
null,
null,
null,
new YourCrawlDecisionMaker(),
null);
Original comment by sjdir...@gmail.com
on 5 Dec 2012 at 8:56
Be sure to update the forum at
https://groups.google.com/forum/#!topic/abot-web-crawler/HFu0DUGN9eU
Original comment by sjdir...@gmail.com
on 5 Dec 2012 at 9:06
Original comment by sjdir...@gmail.com
on 10 Dec 2012 at 8:15
Original issue reported on code.google.com by
sjdir...@gmail.com
on 5 Dec 2012 at 8:29