open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

Scrapy job persistence #485

Closed jpmckinney closed 1 year ago

jpmckinney commented 4 years ago

This is to allow jobs to be paused/resumed, e.g. when re-deploying Kingfisher Collect as a whole to install new requirements.

Scrapy can pause/resume specific crawls https://docs.scrapy.org/en/latest/topics/jobs.html but it's not clear how to configure it to do that for all crawls by default https://github.com/scrapy/scrapy/issues/3416

Can also look into how the following maintain state:

jpmckinney commented 4 years ago

Split out of #79

jpmckinney commented 2 years ago

The next version of Scrapyd will have a job persistence option! https://github.com/scrapy/scrapyd/pull/418

jpmckinney commented 2 years ago

Might just require a one-line configuration: https://scrapyd.readthedocs.io/en/stable/config.html#jobstorage