kootenpv / sky

:sunrise: next generation web crawling using machine intelligence
BSD 3-Clause "New" or "Revised" License
329 stars 44 forks source link

Empty queue causing Task was destroyed because queue.get is wrapped by wait_for #1

Open kootenpv opened 9 years ago

kootenpv commented 9 years ago

It seems that when the queue is empty at the end (https://github.com/kootenpv/sky/blob/master/sky/crawler/crawling.py#L338), it will try to end the futures (that are wrapped in wait_for), and this causes the tasks to not end "normally".

The documentation says that this might be mistake (but I think it is not in this case?). For each worker it causes an "ERROR" in the log, which is not really nice when wanting to report the errors.

I'm trying to stay up-to-date with the 500lines library, but somehow it hangs when crawling a huge website (just as the queue gets empty).

That's why I added the wait_for.

@ajdavis @asvetlov Do you have any idea how to prevent those tasks from spitting out Task was destroyed! at the end, or have an idea how to solve this issue?

I'd be really very grateful!