When the first API request to schedule a script fails due to a request timeout, but the script job is successfully scheduled in Scrapy Cloud, the current implementation retries the script job scheduling, getting bad responses about the job being a duplicate of an existing one until the previously-scheduled job finishes. Then, a duplicate job is created.
I noticed that the counterpart implementation for spiders has a check in place to avoid this issue: if the API reports that the job is a duplicate of a running job, there are no more retries. There is no reason for this approach not to be also taken for scripts.
So I’ve extracted schedule_spider into _schedule_job, removed the unnecessary ‘spider’ references in log messages, and reimplemented schedule_script to use _schedule_job.
These changes include an API change, the removal of shub_workflow.utils.schedule_script_in_dash (which was also available at shub_workflow.script.schedule_script_in_dash).
CC: @hermit-crab (he discovered and diagnosed the original issue)
When the first API request to schedule a script fails due to a request timeout, but the script job is successfully scheduled in Scrapy Cloud, the current implementation retries the script job scheduling, getting bad responses about the job being a duplicate of an existing one until the previously-scheduled job finishes. Then, a duplicate job is created.
I noticed that the counterpart implementation for spiders has a check in place to avoid this issue: if the API reports that the job is a duplicate of a running job, there are no more retries. There is no reason for this approach not to be also taken for scripts.
So I’ve extracted
schedule_spider
into_schedule_job
, removed the unnecessary ‘spider’ references in log messages, and reimplementedschedule_script
to use_schedule_job
.These changes include an API change, the removal of
shub_workflow.utils.schedule_script_in_dash
(which was also available atshub_workflow.script.schedule_script_in_dash
).CC: @hermit-crab (he discovered and diagnosed the original issue)