mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

Shutdown crawler takes long time #100

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1) How can I shutdown immediatly the crawler? It takes long time before it has 
finished..

2) I would like to crawl just a single page (ex: 
http://www.musite.com/mysinglepage.html).. Are these config parameters correct?
config.setMaxDepthOfCrawling(0);
config.setMaxPagesToFetch(1);

Original issue reported on code.google.com by afterbit...@gmail.com on 3 Jan 2012 at 11:14

GoogleCodeExporter commented 9 years ago
crawler4j is designed to crawl many pages using a multi-threaded approach. So, 
before shutting down immediately it has to wait for about one minute to make 
sure that all threads are done with their work. If you only want to crawl a 
limited number of pages and you know the exact URLs in advance, then I suggest 
to use the downloader example: 
http://code.google.com/p/crawler4j/source/browse/src/test/java/edu/uci/ics/crawl
er4j/examples/localdata/Downloader.java

-Yasser

Original comment by ganjisaffar@gmail.com on 3 Jan 2012 at 7:36

GoogleCodeExporter commented 9 years ago
Thank  you Yasser!

Original comment by afterbit...@gmail.com on 5 Jan 2012 at 9:30