When deploying a Scrapy project to Scrapy Cloud, the process will load everything under settings.py file and pickle all the variables on it . That includes ZYTE_API_RETRY_POLICY that is not pickleable, so its not possible to deploy a project with a custom retry policy defined in the settings.py.
Traceback (most recent call last):
File "/usr/local/bin/shub-image-info", line 8, in <module>
sys.exit(shub_image_info())
File "/usr/local/lib/python3.10/site-packages/sh_scrapy/crawl.py", line 209, in shub_image_info
_run_usercode(None, ['scrapy', 'shub_image_info'] + sys.argv[1:],
File "/usr/local/lib/python3.10/site-packages/sh_scrapy/crawl.py", line 138, in _run_usercode
settings = populate_settings(apisettings_func(), spider)
File "/usr/local/lib/python3.10/site-packages/sh_scrapy/settings.py", line 243, in populate_settings
return _populate_settings_base(apisettings, _load_default_settings, spider)
File "/usr/local/lib/python3.10/site-packages/sh_scrapy/settings.py", line 172, in _populate_settings_base
settings = get_project_settings().copy()
File "/usr/local/lib/python3.10/site-packages/scrapy/settings/__init__.py", line 349, in copy
return copy.deepcopy(self)
File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/local/lib/python3.10/copy.py", line 161, in deepcopy
rv = reductor(4)
TypeError: cannot pickle '_thread._local' object
{"message": "shub-image-info exit code: 1", "details": null, "error": "image_info_error"}
{"status": "error", "message": "Internal error"}
Deploy log location: /tmp/shub_deploy_tn3yzm9m.log
Error: Deploy failed: b'{"status": "error", "message": "Internal error"}'
Some workarounds for this problem are:
Define ZYTE_API_RETRY_POLICY inside update_settings method of the spiders. This works for deploying because the class is not instantiated until the spider is running. However, is not a nice solution for the overall project.
Make tenacity.AsyncRetrying pikeable. Not sure if this is even possible, and will be
However, I think the proper solution would be to allow ZYTE_API_RETRY_POLICY to contains a str with the path to the class similar to how other Scrapy settings works:
When deploying a Scrapy project to Scrapy Cloud, the process will load everything under
settings.py
file and pickle all the variables on it . That includesZYTE_API_RETRY_POLICY
that is not pickleable, so its not possible to deploy a project with a custom retry policy defined in the settings.py.Some workarounds for this problem are:
Define ZYTE_API_RETRY_POLICY inside
update_settings
method of the spiders. This works for deploying because the class is not instantiated until the spider is running. However, is not a nice solution for the overall project.Make tenacity.AsyncRetrying pikeable. Not sure if this is even possible, and will be
However, I think the proper solution would be to allow
ZYTE_API_RETRY_POLICY
to contains astr
with the path to the class similar to how other Scrapy settings works: