Option to queue/ignore repeated schedule

scrapy / scrapyd

A service daemon to run Scrapy spiders

https://scrapyd.readthedocs.io/en/stable/

BSD 3-Clause "New" or "Revised" License

2.96k stars 569 forks source link

Option to queue/ignore repeated schedule #153

Closed lucaspottersky closed 2 years ago

lucaspottersky commented 8 years ago

This is a feature request.

I wish scrapyd would not run the same Spider at the same time. Maybe this could be a configuration option? I can see 3 behaviours:

continue scheduling, as it does today
ignore the schedule if there's already a job for the same spider
queue the job and only run it when the previous job for the given spider finishes

I'm asking this because I'm afraid of concurrency problems, since my Spiders write to a file using Feed Exports.

Digenis commented 8 years ago

In practice, you can try checking with listjobs.json if the spider is already scheduled and schedule it only if it's not already. You can theoretically get into a race condition but if you are only concerned about a feed exports file I'd guess your project is not the kind of project under such a risk.

There can be a cleaner solution instead of adding more options to the config. Perhaps we can define a behaviour for jobid collisions like aborting scheduling and then you could come up with a jobid scheme that reserves a "slot". E.g. if your spider is supposed to crawl every 6 hours you would come up with a jobid scheme like this: %Y-%m-%d Q where Q is the quarter of the day (1,2,3,4)

jpmckinney commented 2 years ago

Closing as there has not been additional interest for this feature request since 2016.

Noting that I think it's better to put this logic outside Scrapyd (using its API). I see way too many desired customizations to the scheduling logic (run the repeat crawl after a given interval, auto-schedule crawls, etc.).

Scrapyd is just a basic API for running scrapy crawl. It's not a full-fledged automation server (like Jenkins or similar).

That said, #197 is open, about using a custom queue class.