scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.92k stars 569 forks source link

Adding custom endpoints #484

Closed vazkir closed 2 months ago

vazkir commented 1 year ago

Hi,

I was wondering if it possible to add custom endpoints scrapyd? So to extend the current endpoints by allowing people to write one themselves?

Or perhaps if anyone knows if any workaround to achieve this?

Would very beneficial to developers like my who want to write some of their own logic by creating new endpoints, especially in a docker environment where you have only 1 ENTRYPOINT to interact with, and running only 1 program is recommanded

jpmckinney commented 1 year ago

Can you give an example of an endpoint you would like to add?

vazkir commented 1 year ago

Ofcourse; I wanted to create new spiders automatically based on api input. Which I got working, so for anyone interested:

You can use the WsResource to create similar endpoints by importing it and using it like this:

from scrapyd.webservice import WsResource 

# Examples of any existings scrapyd endpoints: https://github.com/scrapy/scrapyd/blob/master/scrapyd/webservice.py
class TestNewEndpoint(WsResource):

    def render_GET(self, txrequest):

        # projects = list(self.root.scheduler.list_projects())
        return {"node_name": "nodeje", "status": "ok", "projects": {"yeag":"fsf"}}

And then in the scrapyd.conf you want to add your custom endpoint under [services] like this:

...............
[services]
schedule.json     = scrapyd.webservice.Schedule
cancel.json       = scrapyd.webservice.Cancel
addversion.json   = scrapyd.webservice.AddVersion
listprojects.json = scrapyd.webservice.ListProjects
listversions.json = scrapyd.webservice.ListVersions
listspiders.json  = scrapyd.webservice.ListSpiders
delproject.json   = scrapyd.webservice.DeleteProject
delversion.json   = scrapyd.webservice.DeleteVersion
listjobs.json     = scrapyd.webservice.ListJobs
daemonstatus.json = scrapyd.webservice.DaemonStatus
test_endpoint.json     = extra_endpoints.create_spider.TestNewEndpoint
vazkir commented 1 year ago

Only now I am running into the problem of that I want to reload scrapyd so it picks up the new SPIDER_MODULES, without fully cancelling any jobs it might still have been running. Do you by any chance know of the best way to handle this?

I could only find this SO post, where multiple options were given, only not sure which ones would be best suited?

jpmckinney commented 1 year ago

Hmm, I haven't tested changing SPIDER_MODULES specifically, but I would have expected this workflow to work, without requiring a reload of the scrapyd service: