scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.98k stars 569 forks source link

Cancel API not working on multiple nodes. Spiders tested locally with Scrapy and are fine #386

Closed kevingoss2 closed 2 years ago

kevingoss2 commented 4 years ago

I have several machines spidering with scrapyd and I am monitoring and managing them via the scrapyd api. I love the software, but... I cannot seem to cancel jobs. I make the call to the cancel API and get:

'{"node_name": "spider8", "status": "ok", "prevstate": "running"}

It says "ok" so I know that:

When I get the history/running/pending on the nodes, I notice there are several instances of many of the spiders running. It happens to all of the spiders, but randomly as to which spider and which server.

Example of "running" output from the server. The jobID and project sent in ARE correct and I do get the correct response to the cancel API, but here is one that has been running for days and it is a spider that finishes in scrapy in minutes.

{'id': 'a91cf65ae8fa11ea9764d1edb3dcaa77', 'spider': 'PopularScience', 'pid': 1971, 'start_time': '2020-08-28 06:49:49.105971'}

jpmckinney commented 3 years ago

When stopping a job, Scrapyd sends a TERM signal. If you check the log file for the spider/crawl, you should see that it responds with:

Received SIGTERM, shutting down gracefully. Send again to force

So, Scrapy (not Scrapyd) will attempt to shut down the spider gracefully, which can sometimes take a long time if there was a lot of processing already being done in the engine.

If you cancel the job a second time via Scrapyd, the Scrapy will force the spider to stop. Can you try that?

Gallaecio commented 3 years ago

Related to https://github.com/scrapy/scrapy/issues/4749

jpmckinney commented 2 years ago

Closing as no response to question in several months.