Closed gajewsk2 closed 7 years ago
Hi @gajewsk2 , I've also tested icrawler
on web servers and it ends as expected. There may be other reasons for not exiting. Usually there will be only one thread alive after all tasks are finished and the parent thread will exit.
Since it's relying on the threading library, which I believe is acting as a singleton, it says I have 3 running before I even start the crawler on my django app. Basically threading is a global variable and my environment isn't letting my server terminate. Is there a reason to keep this check if the work is done and the threads the crawler has launched are reaped?
It may be better to use the exiting of downloader as a condition to terminate all crawling threads. Checking the thread num is not necessary indeed.
@hellock, I have this issue when using GreedyImageCrawler. The loop never terminates and keeps logging.
2017-08-11 00:52:45,864 - INFO - downloader - downloader-001 is waiting for new download tasks
2017-08-11 00:52:46,624 - INFO - parser - parser-001 is waiting for new page urls
2017-08-11 00:52:48,625 - INFO - parser - parser-001 is waiting for new page urls
2017-08-11 00:52:50,625 - INFO - parser - parser-001 is waiting for new page urls
2017-08-11 00:52:50,865 - INFO - downloader - downloader-001 is waiting for new download tasks
crawler.py
Crawler never stops if there were already more than 1 threads runnin, eg if you are running this on a web server, it will not end. I've simply disabled this line to get things working fine.