scrapy / scrapyd-client

Command line client for Scrapyd server
BSD 3-Clause "New" or "Revised" License
770 stars 146 forks source link

Deploying with --include-dependencies fails when containing a git repository dependency #126

Closed hrafnthor closed 1 year ago

hrafnthor commented 1 year ago

Description

I have a Scrapy project that relies on a library which gets defined as a git repository in the requirements file for the project. When I attempt to deploy it to scrapyd by running scrapyd-deploy --include-dependencies the command fails with an error indicating some problem with the dependency definition.

The initial error stack is the following:

Packing version 1680123941
Including dependencies from requirements.txt
Traceback (most recent call last):
  File "/home/hrafn/.local/bin/scrapyd-deploy", line 8, in <module>
    sys.exit(main())
  File "/home/hrafn/.local/lib/python3.10/site-packages/scrapyd_client/deploy.py", line 117, in main
    exitcode, tmpdir = _build_egg_and_deploy_target(target, version, opts)
  File "/home/hrafn/.local/lib/python3.10/site-packages/scrapyd_client/deploy.py", line 141, in _build_egg_and_deploy_target
    egg, tmpdir = _build_egg(opts)
  File "/home/hrafn/.local/lib/python3.10/site-packages/scrapyd_client/deploy.py", line 298, in _build_egg
    retry_on_eintr(
  File "/home/hrafn/.local/lib/python3.10/site-packages/scrapyd_client/utils.py", line 119, in retry_on_eintr
    return function(*args, **kw)
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'setup.py', 'clean', '-a', 'bdist_uberegg', '-d', '/tmp/scrapydeploy-whxtzl3b']' returned non-zero exit status 1.

And the contents of the referenced tmp folder's stderr is (I have replaced any reference to the library name with 'my-library'):

'build/lib' does not exist -- can't clean it
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-3.10' does not exist -- can't clean it
  Running command git clone --filter=blob:none --quiet 'ssh://****@github.com/hrafnthor/my-library.git' /tmp/pip-install-no2qcx7z/my-library_19b012bf7ea747e9b2efaf39df47abac
  WARNING: Generating metadata for package my-library produced metadata for project name unknown. Fix your #egg=my-library fragments.
ERROR: Could not find a version that satisfies the requirement my-library (unavailable) (from versions: none)
ERROR: No matching distribution found for my-library (unavailable)
Traceback (most recent call last):
  File "/home/hrafn/Documents/dev/python/my-library/scraper/setup.py", line 5, in <module>
    setup(
  File "/home/hrafn/.local/lib/python3.10/site-packages/setuptools/__init__.py", line 108, in setup
    return distutils.core.setup(**attrs)
  File "/home/hrafn/.local/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/home/hrafn/.local/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/home/hrafn/.local/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/hrafn/.local/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command
    super().run_command(command)
  File "/home/hrafn/.local/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/hrafn/.local/lib/python3.10/site-packages/uberegg.py", line 35, in run
    self._install(self.requirements, self.bdist_dir)
  File "/home/hrafn/.local/lib/python3.10/site-packages/uberegg.py", line 45, in _install
    return subprocess.check_call(
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'pip', 'install', '-U', '-t', 'build/bdist.linux-x86_64/egg', '-r', 'requirements.txt']' returned non-zero exit status 1.

The requirements.txt file for the scraper references the library dependency as follows (again the reference is replaced with 'my-library'):

my-library @ git+ssh://git@github.com/hrafnthor/my-library.git@0.0.2

If I am understanding the error correctly then the issue seems to relate to the library dependency not having a version associated with it when the scraper project is built.

It is not immediatly clear to me why this would be. Any potential help in resolving this would be greatly appreciated.

System and env information

scrapy: 2.8.0 scrapyd: 1.4.1 python: 3.10.6 pip: 23.0.1

hrafnthor commented 1 year ago

I looked further into this, and started by looking at what this uberegg library is doing. I'm not sure why this exactly is a dependency of scrapyd, as it just seems to wrap ontop of setuptools does some file content iterations and logging.

Anyway, the issue arrises from line 45 in the uberegg.py script which is invoking subprocess.check_call() which raises the error.

Strangely though there is no such issue raised if the offending command referenced in the stacktrace python3 -m pip install -U -t build/bdist.linux-x86_64/egg -r requirements.txt is called directly.

So from all this it would seem the issue isn't directly related to scrapyd. Unless it is related to the continued use of eggs (I thought those were deprecated more than a decade ago?).

jpmckinney commented 1 year ago

Eggs are not deprecated. Scrapyd's API receives Scrapy projects as eggs.

jpmckinney commented 1 year ago

Looking at your stacktrace I see:

  Running command git clone --filter=blob:none --quiet 'ssh://****@github.com/hrafnthor/my-library.git' /tmp/pip-install-no2qcx7z/my-library_19b012bf7ea747e9b2efaf39df47abac
  WARNING: Generating metadata for package my-library produced metadata for project name unknown. Fix your #egg=my-library fragments.
ERROR: Could not find a version that satisfies the requirement my-library (unavailable) (from versions: none)
ERROR: No matching distribution found for my-library (unavailable)

It looks like Scrapyd can't install your https://github.com/hrafnthor/my-library (which is a 404 for me, so I assume it is private). I suspect you need to give the user that runs Scrapyd SSH key access to your GitHub account / repository.

hrafnthor commented 1 year ago

Thank you for the response @jpmckinney, and please pardon the my ignorance on the usage of eggs (Python packaging is new to me and it would seem information available online is not clear for new comers on if they are deprecated or not).

The issue at hand did turn out to be a naming mismatch in dependency definition for the library. It was defined with a underscore not a dash which threw the whole thing off and was subtle enough for me not to notice it until after a good while.

So like mentioned before, this had nothing to do with scrapyd.

mouday commented 1 year ago

me too this error.

jpmckinney commented 1 year ago

See the above messages - this error is not caused by Scrapyd but by your own dependency definitions / requirements file.