Closed hillhousehold closed 4 years ago
Can you add this line...
impls.WebCrawlerFactory = new WebCrawlerFactory(config); //!!!!!!!!!!!!!!!!!This is new!!!!!
before this one... ParallelCrawlerEngine crawlEngine = new ParallelCrawlerEngine(config, impls);
Let me know if that solves your problem.
Thank you, but it is still not working properly. I added the line of code, recompiled, and ran a test with 8 Domains and 1000 MaxPagesToCrawlPerDomain. It is currently surpassing 15,000 pages crawled. It shouldn't have gone past 8000 pages.
My counter is incremented each time the crawler fires the PageCrawlCompletedAsync() event.
Also, my project is running Version 1.3.81 of AbotX and 1.6.0.5 of Abot
I was able to get it working by using the SiteToCrawl.CrawlConfiguration property instead of a global crawl configuration for the crawl.
I've been unable to reproduce this issue. If this springs up again in version 2.0+ feel free to reopen
I'm testing this and the crawler is currently on page ~55,000 and several layers deep for one of the three domains that I am crawling on the test. The code I used to load the configuration is below. I load from the app config xml and then override some of the settings in the method to customize the crawl based on user input for specific test crawls that I'm running. The two values in question are hard coded to 1000 and 1 respectively for this test. Am I doing something wrong?
var config = AbotXConfigurationSectionHandler.LoadFromXml().Convert(); config.CrawlTimeoutSeconds = timeoutMilliseconds / 1000; config.HttpRequestTimeoutInSeconds = timeoutMilliseconds / 1000; config.JavascriptRenderingWaitTimeInMilliseconds = timeoutMilliseconds; config.MaxCrawlDepth = 1; //set for testing only config.JavascriptRenderingWaitTimeInMilliseconds = javascriptTimeout; config.MaxPagesToCrawlPerDomain = 1000; //set for testing only ParallelImplementationOverride impls = new ParallelImplementationOverride(config); impls.SiteToCrawlProvider.AddSitesToCrawl(sites); ParallelCrawlerEngine crawlEngine = new ParallelCrawlerEngine(config, impls);