stewartmckee / cobweb

Web crawler with very flexible crawling options. Can either use standalone or can be used with resque to perform clustered crawls.
MIT License
226 stars 45 forks source link

Two improvements for you to look at here - inprogress + updating of setting the queued state #13

Closed rojotek closed 11 years ago

rojotek commented 11 years ago

There has been added an inprogress state to help ensure that the crawl finished event occurs when the crawl is really finished. Previously it was possible for the crawl finished event to occur before all jobs were complete when running without any page limits set.

The queued state was only being set after the enqueue operation was complete. It would be possible for two different queues to then enqueue the same job. It's now set before the enqueue is performed. This still could use some double enqueueing of links, but the edge case is decreasing.

stewartmckee commented 11 years ago

Hey Rob,

Didn't seem to get notified of this pull request, or it got lost somewhere, will get it merged soon.