scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.93k stars 571 forks source link

Persistent scheduling based on CRON #453

Closed kanchansapkota27 closed 2 years ago

kanchansapkota27 commented 2 years ago

Is it possible to extend the project with one extra endpoint for scheduling say POST http:localhost:6800/cronschedule.json with extra args like cron_exp also with db_url as setting in scrapyd.cfg ?

May be it can be implemented with APScheduler Link which provides TwistedScheduler Link

Is it possible by modifiying scrapyd codes?

PS: I am beginner and just looking for possibilities.

jpmckinney commented 2 years ago

I think a simple solution is to create a cronjob on another machine that sends requests to schedule.json.

kanchansapkota27 commented 2 years ago

Yes I have come across solutions that do what you have suggested. I was just looking at the possibility that we could just implement the feature in a single scrapyd service rather that implementing another on top of it just for periodic persistent schedules.

jpmckinney commented 2 years ago

cron is available in pretty much every Linux distribution, so in this case I think it's fine to use multiple tools, rather than put all the features into one monolith.

kanchansapkota27 commented 2 years ago

cron is available in pretty much every Linux distribution, so in this case I think it's fine to use multiple tools, rather than put all the features into one monolith.

I think i will go with that thank you.