scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
530 stars 96 forks source link

CI is broken in master #422

Closed curita closed 10 months ago

curita commented 12 months ago

Issue

The CI checks fail in master. This is affecting new PRs (https://github.com/scrapinghub/spidermon/pull/421)

Reproduce

❯ python --version
Python 3.10.9
❯ python -m venv .venv
❯ source .venv/bin/activate
❯ pip install tox
❯ tox -e base

Traceback

❯ tox -e base
base: install_deps> python -I -m pip install Jinja2 pytest pytest-cov pytest-mock scrapy
.pkg: install_requires> python -I -m pip install 'setuptools>=40.8.0' wheel
.pkg: _optional_hooks> python /Users/julia/src/spidermon/.venv/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_sdist> python /Users/julia/src/spidermon/.venv/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_wheel> python /Users/julia/src/spidermon/.venv/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: install_requires_for_build_wheel> python -I -m pip install wheel
.pkg: prepare_metadata_for_build_wheel> python /Users/julia/src/spidermon/.venv/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: build_sdist> python /Users/julia/src/spidermon/.venv/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
base: install_package_deps> python -I -m pip install Jinja2 boto boto3 itemadapter 'jsonschema[format]>=3.2.0' premailer python-slugify requests scrapinghub scrapinghub-entrypoint-scrapy scrapy sentry-sdk slack-sdk
base: install_package> python -I -m pip install --force-reinstall --no-deps /Users/julia/src/spidermon/.tox/.tmp/package/1/spidermon-1.20.0.tar.gz
base: commands[0]> pytest -s --ignore=./tests/contrib --ignore=./tests/utils/test_zyte.py tests
================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.10.9, pytest-7.4.2, pluggy-1.3.0
cachedir: .tox/base/.pytest_cache
Spidermon monitor filtering
rootdir: /Users/julia/src/spidermon
plugins: cov-4.1.0, mock-3.11.1
collected 384 items                                                                                                                                                                      

tests/test_actions.py ......
tests/test_add_field_coverage.py ..........
tests/test_data.py .........
tests/test_descriptions.py ...
tests/test_expressions.py ....
tests/test_extension.py FFFFFF
tests/test_item_scraped_signal.py ...............
tests/test_levels.py .
tests/test_loaders.py ...
tests/test_messagetranslator.py ...
tests/test_names.py ....
tests/test_ordering.py ..
tests/test_spidermon_signal_connect.py ......
tests/test_suites.py ........
tests/test_templateloader.py ...
tests/test_validators_jsonschema.py ................................................................................................................................................................................................................................................................
tests/utils/test_field_coverage.py ..
tests/utils/test_settings.py ......

======================================================================================== FAILURES ========================================================================================
__________________________________________________________________________ test_spider_opened_suites_should_run __________________________________________________________________________

get_crawler = <function get_crawler.<locals>._crawler at 0x1277fe320>, suites = ['tests.fixtures.suites.Suite01']

    def test_spider_opened_suites_should_run(get_crawler, suites):
        """The suites defined at spider_opened_suites should be loaded and run"""
        crawler = get_crawler()
        spidermon = Spidermon(crawler, spider_opened_suites=suites)
        spidermon.spider_opened_suites[0].run = mock.MagicMock()
>       spidermon.spider_opened(crawler.spider)

tests/test_extension.py:18: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
spidermon/contrib/scrapy/extensions.py:120: in spider_opened
    self._run_suites(spider, self.spider_opened_suites)
spidermon/contrib/scrapy/extensions.py:203: in _run_suites
    data = self._generate_data_for_spider(spider)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <spidermon.contrib.scrapy.extensions.Spidermon object at 0x127811270>, spider = <Spider 'dummy' at 0x1247c9030>

    def _generate_data_for_spider(self, spider):
        return {
>           "stats": self.crawler.stats.get_stats(spider),
            "stats_history": spider.stats_history
            if hasattr(spider, "stats_history")
            else [],
            "crawler": self.crawler,
            "spider": spider,
            "job": self.client.job if self.client.available else None,
        }
E       AttributeError: 'NoneType' object has no attribute 'get_stats'

spidermon/contrib/scrapy/extensions.py:210: AttributeError
__________________________________________________________________________ test_spider_closed_suites_should_run __________________________________________________________________________

get_crawler = <function get_crawler.<locals>._crawler at 0x1277fe440>, suites = ['tests.fixtures.suites.Suite01']

    def test_spider_closed_suites_should_run(get_crawler, suites):
        """The suites defined at spider_closed_suites should be loaded and run"""
        crawler = get_crawler()
        spidermon = Spidermon(
            crawler, spider_opened_suites=suites, spider_closed_suites=suites
        )
        spidermon.spider_closed_suites[0].run = mock.MagicMock()
>       spidermon.spider_opened(crawler.spider)

tests/test_extension.py:30: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
spidermon/contrib/scrapy/extensions.py:120: in spider_opened
    self._run_suites(spider, self.spider_opened_suites)
spidermon/contrib/scrapy/extensions.py:203: in _run_suites
    data = self._generate_data_for_spider(spider)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <spidermon.contrib.scrapy.extensions.Spidermon object at 0x12787f3d0>, spider = <Spider 'dummy' at 0x127835390>

    def _generate_data_for_spider(self, spider):
        return {
>           "stats": self.crawler.stats.get_stats(spider),
            "stats_history": spider.stats_history
            if hasattr(spider, "stats_history")
            else [],
            "crawler": self.crawler,
            "spider": spider,
            "job": self.client.job if self.client.available else None,
        }
E       AttributeError: 'NoneType' object has no attribute 'get_stats'

spidermon/contrib/scrapy/extensions.py:210: AttributeError
_________________________________________________________________________ test_engine_stopped_suites_should_run __________________________________________________________________________

get_crawler = <function get_crawler.<locals>._crawler at 0x1277fe950>, suites = ['tests.fixtures.suites.Suite01']

    def test_engine_stopped_suites_should_run(get_crawler, suites):
        """The suites defined at engine_stopped_suites should be loaded and run"""
        crawler = get_crawler()
        spidermon = Spidermon(crawler, engine_stopped_suites=suites)
        spidermon.engine_stopped_suites[0].run = mock.MagicMock()
>       spidermon.engine_stopped()

tests/test_extension.py:41: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
spidermon/contrib/scrapy/extensions.py:136: in engine_stopped
    self._run_suites(spider, self.engine_stopped_suites)
spidermon/contrib/scrapy/extensions.py:203: in _run_suites
    data = self._generate_data_for_spider(spider)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <spidermon.contrib.scrapy.extensions.Spidermon object at 0x12779a590>, spider = <Spider 'dummy' at 0x127798a60>

    def _generate_data_for_spider(self, spider):
        return {
>           "stats": self.crawler.stats.get_stats(spider),
            "stats_history": spider.stats_history
            if hasattr(spider, "stats_history")
            else [],
            "crawler": self.crawler,
            "spider": spider,
            "job": self.client.job if self.client.available else None,
        }
E       AttributeError: 'NoneType' object has no attribute 'get_stats'

spidermon/contrib/scrapy/extensions.py:210: AttributeError
____________________________________________________________________ test_spider_opened_suites_should_run_from_signal ____________________________________________________________________

self = <MagicMock id='4957297904'>, args = (<ANY>,), kwargs = {}, msg = "Expected 'mock' to be called once. Called 0 times."

    def assert_called_once_with(self, /, *args, **kwargs):
        """assert that the mock was called exactly once and that that call was
        with the specified arguments."""
        if not self.call_count == 1:
            msg = ("Expected '%s' to be called once. Called %s times.%s"
                   % (self._mock_name or 'mock',
                      self.call_count,
                      self._calls_repr()))
>           raise AssertionError(msg)
E           AssertionError: Expected 'mock' to be called once. Called 0 times.

../../.pyenv/versions/3.10.9/lib/python3.10/unittest/mock.py:940: AssertionError

During handling of the above exception, another exception occurred:

get_crawler = <function get_crawler.<locals>._crawler at 0x1277fe710>, suites = ['tests.fixtures.suites.Suite01']

    def test_spider_opened_suites_should_run_from_signal(get_crawler, suites):
        """The suites defined at SPIDERMON_SPIDER_OPEN_MONITORS setting should be loaded and run"""
        settings = {"SPIDERMON_SPIDER_OPEN_MONITORS": suites}
        crawler = get_crawler(settings)
        spidermon = Spidermon.from_crawler(crawler)
        spidermon.spider_opened_suites[0].run = mock.MagicMock()
        crawler.signals.send_catch_log(signal=signals.spider_opened, spider=crawler.spider)
>       spidermon.spider_opened_suites[0].run.assert_called_once_with(mock.ANY)
E       AssertionError: Expected 'mock' to be called once. Called 0 times.

tests/test_extension.py:53: AssertionError
----------------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------------
ERROR    scrapy.utils.signal:signal.py:59 Error caught on signal handler: <bound method Spidermon.spider_opened of <spidermon.contrib.scrapy.extensions.Spidermon object at 0x1277a4520>>
Traceback (most recent call last):
  File "/Users/julia/src/spidermon/.tox/base/lib/python3.10/site-packages/scrapy/utils/signal.py", line 46, in send_catch_log
    response = robustApply(
  File "/Users/julia/src/spidermon/.tox/base/lib/python3.10/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 120, in spider_opened
    self._run_suites(spider, self.spider_opened_suites)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 203, in _run_suites
    data = self._generate_data_for_spider(spider)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 210, in _generate_data_for_spider
    "stats": self.crawler.stats.get_stats(spider),
AttributeError: 'NoneType' object has no attribute 'get_stats'
____________________________________________________________________ test_spider_closed_suites_should_run_from_signal ____________________________________________________________________

self = <MagicMock id='4958067808'>, args = (<ANY>,), kwargs = {}, msg = "Expected 'mock' to be called once. Called 0 times."

    def assert_called_once_with(self, /, *args, **kwargs):
        """assert that the mock was called exactly once and that that call was
        with the specified arguments."""
        if not self.call_count == 1:
            msg = ("Expected '%s' to be called once. Called %s times.%s"
                   % (self._mock_name or 'mock',
                      self.call_count,
                      self._calls_repr()))
>           raise AssertionError(msg)
E           AssertionError: Expected 'mock' to be called once. Called 0 times.

../../.pyenv/versions/3.10.9/lib/python3.10/unittest/mock.py:940: AssertionError

During handling of the above exception, another exception occurred:

get_crawler = <function get_crawler.<locals>._crawler at 0x1277fe8c0>, suites = ['tests.fixtures.suites.Suite01']

    def test_spider_closed_suites_should_run_from_signal(get_crawler, suites):
        """The suites defined at SPIDERMON_SPIDER_CLOSE_MONITORS setting should be loaded and run"""
        settings = {"SPIDERMON_SPIDER_CLOSE_MONITORS": suites}
        crawler = get_crawler(settings)
        spidermon = Spidermon.from_crawler(crawler)
        spidermon.spider_closed_suites[0].run = mock.MagicMock()
        crawler.signals.send_catch_log(signal=signals.spider_closed, spider=crawler.spider)
>       spidermon.spider_closed_suites[0].run.assert_called_once_with(mock.ANY)
E       AssertionError: Expected 'mock' to be called once. Called 0 times.

tests/test_extension.py:63: AssertionError
----------------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------------
ERROR    scrapy.utils.signal:signal.py:59 Error caught on signal handler: <bound method Spidermon.spider_closed of <spidermon.contrib.scrapy.extensions.Spidermon object at 0x127860220>>
Traceback (most recent call last):
  File "/Users/julia/src/spidermon/.tox/base/lib/python3.10/site-packages/scrapy/utils/signal.py", line 46, in send_catch_log
    response = robustApply(
  File "/Users/julia/src/spidermon/.tox/base/lib/python3.10/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 128, in spider_closed
    self._add_field_coverage_to_stats()
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 181, in _add_field_coverage_to_stats
    stats = self.crawler.stats.get_stats()
AttributeError: 'NoneType' object has no attribute 'get_stats'
___________________________________________________________________ test_engine_stopped_suites_should_run_from_signal ____________________________________________________________________

self = <MagicMock id='4957251824'>, args = (<ANY>,), kwargs = {}, msg = "Expected 'mock' to be called once. Called 0 times."

    def assert_called_once_with(self, /, *args, **kwargs):
        """assert that the mock was called exactly once and that that call was
        with the specified arguments."""
        if not self.call_count == 1:
            msg = ("Expected '%s' to be called once. Called %s times.%s"
                   % (self._mock_name or 'mock',
                      self.call_count,
                      self._calls_repr()))
>           raise AssertionError(msg)
E           AssertionError: Expected 'mock' to be called once. Called 0 times.

../../.pyenv/versions/3.10.9/lib/python3.10/unittest/mock.py:940: AssertionError

During handling of the above exception, another exception occurred:

get_crawler = <function get_crawler.<locals>._crawler at 0x1277fe950>, suites = ['tests.fixtures.suites.Suite01']

    def test_engine_stopped_suites_should_run_from_signal(get_crawler, suites):
        """The suites defined at SPIDERMON_ENGINE_STOP_MONITORS setting should be loaded and run"""
        settings = {"SPIDERMON_ENGINE_STOP_MONITORS": suites}
        crawler = get_crawler(settings)
        spidermon = Spidermon.from_crawler(crawler)
        spidermon.engine_stopped_suites[0].run = mock.MagicMock()
        crawler.signals.send_catch_log(signal=signals.engine_stopped, spider=crawler.spider)
>       spidermon.engine_stopped_suites[0].run.assert_called_once_with(mock.ANY)
E       AssertionError: Expected 'mock' to be called once. Called 0 times.

tests/test_extension.py:73: AssertionError
----------------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------------
ERROR    scrapy.utils.signal:signal.py:59 Error caught on signal handler: <bound method Spidermon.engine_stopped of <spidermon.contrib.scrapy.extensions.Spidermon object at 0x12779afe0>>
Traceback (most recent call last):
  File "/Users/julia/src/spidermon/.tox/base/lib/python3.10/site-packages/scrapy/utils/signal.py", line 46, in send_catch_log
    response = robustApply(
  File "/Users/julia/src/spidermon/.tox/base/lib/python3.10/site-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 136, in engine_stopped
    self._run_suites(spider, self.engine_stopped_suites)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 203, in _run_suites
    data = self._generate_data_for_spider(spider)
  File "/Users/julia/src/spidermon/spidermon/contrib/scrapy/extensions.py", line 210, in _generate_data_for_spider
    "stats": self.crawler.stats.get_stats(spider),
AttributeError: 'NoneType' object has no attribute 'get_stats'
==================================================================================== warnings summary ====================================================================================
spidermon/contrib/pytest/plugins/filter_monitors.py:10
  /Users/julia/src/spidermon/spidermon/contrib/pytest/plugins/filter_monitors.py:10: PytestDeprecationWarning: The hookimpl pytest_collection_modifyitems uses old-style configuration options (marks or attributes).
  Please use the pytest.hookimpl(trylast=True) decorator instead
   to configure the hooks.
   See https://docs.pytest.org/en/latest/deprecations.html#configuring-hook-specs-impls-using-markers
    @pytest.mark.trylast

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================ short test summary info =================================================================================
FAILED tests/test_extension.py::test_spider_opened_suites_should_run - AttributeError: 'NoneType' object has no attribute 'get_stats'
FAILED tests/test_extension.py::test_spider_closed_suites_should_run - AttributeError: 'NoneType' object has no attribute 'get_stats'
FAILED tests/test_extension.py::test_engine_stopped_suites_should_run - AttributeError: 'NoneType' object has no attribute 'get_stats'
FAILED tests/test_extension.py::test_spider_opened_suites_should_run_from_signal - AssertionError: Expected 'mock' to be called once. Called 0 times.
FAILED tests/test_extension.py::test_spider_closed_suites_should_run_from_signal - AssertionError: Expected 'mock' to be called once. Called 0 times.
FAILED tests/test_extension.py::test_engine_stopped_suites_should_run_from_signal - AssertionError: Expected 'mock' to be called once. Called 0 times.
======================================================================== 6 failed, 341 passed, 1 warning in 1.78s ========================================================================
base: exit 1 (3.90 seconds) /Users/julia/src/spidermon> pytest -s --ignore=./tests/contrib --ignore=./tests/utils/test_zyte.py tests pid=81920
.pkg: _exit> python /Users/julia/src/spidermon/.venv/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
  base: FAIL code 1 (55.27=setup[51.37]+cmd[3.90] seconds)
  evaluation failed :( (55.38 seconds)

Initial Diagnosis

Looks like self.crawler.stats is None and that presents issues later on. Seems the crawler we are creating in the get_crawler() fixture (located in conftest.py) doesn't have a stats instance initialized yet.

It's possible this is something that changed in latest Scrapy versions, and it's showing up now as Scrapy isn't pinned.

Gallaecio commented 11 months ago

https://github.com/scrapy/scrapy/pull/6038

See specifically how Scrapy’s own get_crawler in scrapy.utils.test changed.