scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.96k stars 570 forks source link

upload egg file raise an DistributionNotFound exception for requirements package #451

Closed radouani1984 closed 2 years ago

radouani1984 commented 2 years ago

Description

When I tried to deploy an egg file to the scrapyd server within a docker container, it always returned a DistributionNotFound error for each external package. I tried both with curl and scrapyd-deploy

Steps to Reproduce

  1. python setup.py bdist_uberegg -r requirements.txt
  2. curl http://localhost:13334/addversion.json -F project=pr_reviews -F version=1.0.0 -F egg=.\dist\project-1.0-py3.10.egg

or

  1. scrapyd-deploy --include-dependencies

Actual behavior: [What actually happens]

$ curl http://localhost:13334/addversion.json -F project=pr_reviews -F version=re1 -F egg=@".\dist\project-1.0-py3.10.egg"
{"node_name": "e9e273febed8", "status": "error", "message": "/usr/local/lib/python3.9/dist-packages/scrapy/utils/project.py:81: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION
  warnings.warn(
Traceback (most recent call last):
  File \"/usr/lib/python3.9/runpy.py\", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File \"/usr/lib/python3.9/runpy.py\", line 87, in _run_code
    exec(code, run_globals)
  File \"/usr/local/lib/python3.9/dist-packages/scrapyd/runner.py\", line 46, in <module>
    main()
  File \"/usr/local/lib/python3.9/dist-packages/scrapyd/runner.py\", line 43, in main
    execute()
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/cmdline.py\", line 144, in execute
    cmd.crawler_process = CrawlerProcess(settings)
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/crawler.py\", line 290, in __init__
    super().__init__(settings)
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/crawler.py\", line 167, in __init__
    self.spider_loader = self._get_spider_loader(settings)

 File \"/usr/local/lib/python3.9/dist-packages/scrapy/crawler.py\", line 161, in _get_spider_loader
    return loader_cls.from_settings(settings.frozencopy())
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/spiderloader.py\", line 67, in from_settings
    return cls(settings)
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/spiderloader.py\", line 24, in __init__
    self._load_all_spiders()
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/spiderloader.py\", line 51, in _load_all_spiders
    for module in walk_modules(name):
  File \"/usr/local/lib/python3.9/dist-packages/scrapy/utils/misc.py\", line 88, in walk_modules
    submod = import_module(fullpath)
  File \"/usr/lib/python3.9/importlib/__init__.py\", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File \"<frozen importlib._bootstrap>\", line 1030, in _gcd_import
  File \"<frozen importlib._bootstrap>\", line 1007, in _find_and_load
  File \"<frozen importlib._bootstrap>\", line 986, in _find_and_load_unlocked
  File \"<frozen importlib._bootstrap>\", line 664, in _load_unlocked
  File \"<frozen importlib._bootstrap>\", line 627, in _load_backward_compatible
  File \"<frozen zipimport>\", line 259, in load_module
  File \"/tmp/pr_reviews-re1-vukl7x51.egg/pr_reviews/spiders/ReviewCrawler.py\", line 2, in <module>
  File \"<frozen importlib._bootstrap>\", line 1007, in _find_and_load
  File \"<frozen importlib._bootstrap>\", line 986, in _find_and_load_unlocked
  File \"<frozen importlib._bootstrap>\", line 664, in _load_unlocked
  File \"<frozen importlib._bootstrap>\", line 627, in _load_backward_compatible
  File \"<frozen zipimport>\", line 259, in load_module
  File \"/tmp/pr_reviews-re1-vukl7x51.egg/pr_reviews/items.py\", line 4, in <module>
  File \"<frozen importlib._bootstrap>\", line 1007, in _find_and_load
  File \"<frozen importlib._bootstrap>\", line 986, in _find_and_load_unlocked
  File \"<frozen importlib._bootstrap>\", line 664, in _load_unlocked
  File \"<frozen importlib._bootstrap>\", line 627, in _load_backward_compatible
  File \"<frozen zipimport>\", line 259, in load_module
  File \"/tmp/pr_reviews-re1-vukl7x51.egg/pycountry/__init__.py\", line 13, in <module>
  File \"/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py\", line 478, in get_distribution
    dist = get_provider(dist)
  File \"/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py\", line 354, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File \"/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py\", line 909, in require
    needed = self.resolve(parse_requirements(requirements))
  File \"/usr/local/lib/python3.9/dist-packages/pkg_resources/__init__.py\", line 795, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'pycountry' distribution was not found and is required by the application
"}

setup.py

from setuptools import setup, find_packages

setup(
    name='project',
    version='1.0',
    packages=find_packages(),
    install_requires=[
        "itemadapter",
        "langdetect",
        "pycountry",
        "scrapy",
        "setuptools",
        "scrapy_zyte_smartproxy",
        "scrapy_user_agents"
    ],
    entry_points={'scrapy': ['settings = pr_reviews.settings']},
)

requirements.txt

langdetect==1.0.9
pycountry==22.3.5
scrapy==2.6.1
setuptools==63.1.0
scrapy_zyte_smartproxy==2.1.0
scrapy_user_agents==0.1.1

Scrapd server

FROM ubuntu:20.04

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update -qq \
      && apt-get install -y tini git python3.9 python3.9-dev python3.9-distutils curl python3.9-venv build-essential libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev nginx apache2-utils \
      && apt-get clean \
      && rm -rf /var/lib/apt/lists/*

RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py

ADD requirements.txt /

RUN python3.9 -m pip install -r /requirements.txt 
RUN python3.9 -m pip install -e git+https://github.com/necrophcodr/chaperone.git#egg=chaperone 
RUN mkdir /etc/chaperone.d

COPY ./scrapyd.conf /etc/scrapyd/

VOLUME /etc/scrapyd/ /var/lib/scrapyd/

EXPOSE 6800

ENTRYPOINT ["tini", "--"]
CMD ["scrapyd", "--pidfile="]
jpmckinney commented 2 years ago

pycountry has a bug, which is fixed but not yet released. It is fixed here: https://github.com/flyingcircusio/pycountry/pull/52 See also https://github.com/flyingcircusio/pycountry/issues/106

We don't control pycountry, so cannot fix the issue.