scrapy-plugins / scrapy-zyte-api

Zyte API integration for Scrapy
BSD 3-Clause "New" or "Revised" License
35 stars 19 forks source link

ZYTE_API_RETRY_POLICY doesn't work with Scrapy Cloud deployments #43

Closed VMRuiz closed 2 years ago

VMRuiz commented 2 years ago

When deploying a Scrapy project to Scrapy Cloud, the process will load everything under settings.py file and pickle all the variables on it . That includes ZYTE_API_RETRY_POLICY that is not pickleable, so its not possible to deploy a project with a custom retry policy defined in the settings.py.

Traceback (most recent call last):
  File "/usr/local/bin/shub-image-info", line 8, in <module>
    sys.exit(shub_image_info())
  File "/usr/local/lib/python3.10/site-packages/sh_scrapy/crawl.py", line 209, in shub_image_info
    _run_usercode(None, ['scrapy', 'shub_image_info'] + sys.argv[1:],
  File "/usr/local/lib/python3.10/site-packages/sh_scrapy/crawl.py", line 138, in _run_usercode
    settings = populate_settings(apisettings_func(), spider)
  File "/usr/local/lib/python3.10/site-packages/sh_scrapy/settings.py", line 243, in populate_settings
    return _populate_settings_base(apisettings, _load_default_settings, spider)
  File "/usr/local/lib/python3.10/site-packages/sh_scrapy/settings.py", line 172, in _populate_settings_base
    settings = get_project_settings().copy()
  File "/usr/local/lib/python3.10/site-packages/scrapy/settings/__init__.py", line 349, in copy
    return copy.deepcopy(self)
  File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/local/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_thread._local' object
{"message": "shub-image-info exit code: 1", "details": null, "error": "image_info_error"}

{"status": "error", "message": "Internal error"}
Deploy log location: /tmp/shub_deploy_tn3yzm9m.log
Error: Deploy failed: b'{"status": "error", "message": "Internal error"}'

Some workarounds for this problem are:

However, I think the proper solution would be to allow ZYTE_API_RETRY_POLICY to contains a str with the path to the class similar to how other Scrapy settings works:

Gallaecio commented 2 years ago

It is not just Scrapy Cloud, it seems Scrapy settings must be pickeable, always.