Shutdown crawler takes long time

GoogleCodeExporter commented 9 years ago

1) How can I shutdown immediatly the crawler? It takes long time before it has 
finished..

2) I would like to crawl just a single page (ex: 
http://www.musite.com/mysinglepage.html).. Are these config parameters correct?
config.setMaxDepthOfCrawling(0);
config.setMaxPagesToFetch(1);

Original issue reported on code.google.com by afterbit...@gmail.com on 3 Jan 2012 at 11:14

GoogleCodeExporter commented 9 years ago

crawler4j is designed to crawl many pages using a multi-threaded approach. So, 
before shutting down immediately it has to wait for about one minute to make 
sure that all threads are done with their work. If you only want to crawl a 
limited number of pages and you know the exact URLs in advance, then I suggest 
to use the downloader example: 
http://code.google.com/p/crawler4j/source/browse/src/test/java/edu/uci/ics/crawl
er4j/examples/localdata/Downloader.java

-Yasser

Original comment by ganjisaffar@gmail.com on 3 Jan 2012 at 7:36

Changed state: Done
Added labels: Type-Other
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Thank  you Yasser!

Original comment by afterbit...@gmail.com on 5 Jan 2012 at 9:30

mohankreddy / crawler4j

Shutdown crawler takes long time #100