Open liudonghua123 opened 9 years ago
Hi,
I am pretty sure the problem is the same as I described in issue #52 - the solution for me was to write my own PageFetcher class overwriting the fetchPage(WebUrl) method and change the part with check for maximum size as follows:
CloseableHttpResponse response = httpClient.execute(get);
...
// Checking maximum size
if (fetchResult.getEntity() != null) {
long size = fetchResult.getEntity().getContentLength();
if (size > config.getMaxDownloadSize()) {
//fix issue #52 - close response!
response.close();
throw new PageBiggerThanMaxSizeException(size);
}
}
Sadly nobody commented on my issue. But please give it a try and let me know if it solves your problem.
Best regards Albert
I use this excellent library to scrape some site, but sometimes it stopped unexpected without any exceptions or errors. my custom webcrawler is belows
The following is some log message where it stopped
or
I walk through the code, but find nothing useful information about this strange problem.
ps, I tried to set different number of crawlers(5,10,100,500,1000) and ran both Windows and Linux OS! The problem occured when crawler4j crawled about 10000+ pages (numberOfCrawlers set to 10), or when crawler4j crawled about 60000+ pages (numberOfCrawlers set to 1000)
I tried to crawl some small sized website, no such problems shown.