Cancel API not working on multiple nodes. Spiders tested locally with Scrapy and are fine

kevingoss2 commented 4 years ago

I have several machines spidering with scrapyd and I am monitoring and managing them via the scrapyd api. I love the software, but... I cannot seem to cancel jobs. I make the call to the cancel API and get:

'{"node_name": "spider8", "status": "ok", "prevstate": "running"}

It says "ok" so I know that:

i hit the right node
the spider is running

When I get the history/running/pending on the nodes, I notice there are several instances of many of the spiders running. It happens to all of the spiders, but randomly as to which spider and which server.

Example of "running" output from the server. The jobID and project sent in ARE correct and I do get the correct response to the cancel API, but here is one that has been running for days and it is a spider that finishes in scrapy in minutes.

{'id': 'a91cf65ae8fa11ea9764d1edb3dcaa77', 'spider': 'PopularScience', 'pid': 1971, 'start_time': '2020-08-28 06:49:49.105971'}

jpmckinney commented 3 years ago

When stopping a job, Scrapyd sends a TERM signal. If you check the log file for the spider/crawl, you should see that it responds with:

Received SIGTERM, shutting down gracefully. Send again to force

So, Scrapy (not Scrapyd) will attempt to shut down the spider gracefully, which can sometimes take a long time if there was a lot of processing already being done in the engine.

If you cancel the job a second time via Scrapyd, the Scrapy will force the spider to stop. Can you try that?

Gallaecio commented 3 years ago

jpmckinney commented 2 years ago

Closing as no response to question in several months.

scrapy / scrapyd

Cancel API not working on multiple nodes. Spiders tested locally with Scrapy and are fine #386