scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
50.99k stars 10.34k forks source link

Scrapy and Great Expectations: Error - __provides__ #6307

Closed culpgrant closed 1 week ago

culpgrant commented 1 month ago

Description

I am trying to use Scrapy and Great Expectations in the same virtual environment but there is an issue depending on the order I import the packages in.

I created an issue for Great Expectations with additional details.

They were mentioning it might be something with abc being monkey-patched.

Steps to Reproduce

This does work:

import great_expectations
import scrapy

This does not work:

import scrapy
import great_expectations

Error:

Traceback (most recent call last):
  File 

"/Users/grant/vs_code_projects/grants_projects/test_environment.py", line 2, in <module>
    import great_expectations
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/__init__.py", line 32, in <module>
    register_core_expectations()
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/expectations/registry.py", line 187, in register_core_expectations
    from great_expectations.expectations import core  # noqa: F401
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/expectations/core/__init__.py", line 1, in <module>
    from .expect_column_distinct_values_to_be_in_set import (
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/expectations/core/expect_column_distinct_values_to_be_in_set.py", line 12, in <module>
    from great_expectations.expectations.expectation import (
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/expectations/expectation.py", line 2350, in <module>
    class BatchExpectation(Expectation, ABC):
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/expectations/expectation.py", line 287, in __new__
    newclass._register_renderer_functions()
  File "/Users/grant/Envs/test_env/lib/python3.8/site-packages/great_expectations/expectations/expectation.py", line 369, in _register_renderer_functions
    attr_obj: Callable = getattr(cls, candidate_renderer_fn_name)
AttributeError: __provides__

Expected behavior: Be able to use the packages together in the same virtual environment

Actual behavior: Cannot import the packages together

Reproduces how often: 100%

Versions

Scrapy 2.11.1 great-expectations 0.18.12

Additional context

Looking for a possible solution on what could be done. Thank you!

Gallaecio commented 1 month ago

I think this person might be on to something.

wRAR commented 1 month ago

Well, at least importing zope.interface (and twisted) instead of scrapy doesn't reproduce the error (I really hoped that will be the problem).

VMRuiz commented 4 weeks ago

I was able to reproduce this issue by importing twisted.ssl.Certificate:

(great_expectations) ➜  scrapy git:(master) ✗ python
Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import twisted.internet.ssl
>>> import great_expectations
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/__init__.py", line 32, in <module>
    register_core_expectations()
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/registry.py", line 187, in register_core_expectations
    from great_expectations.expectations import core  # noqa: F401
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/core/__init__.py", line 1, in <module>
    from .expect_column_distinct_values_to_be_in_set import (
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/core/expect_column_distinct_values_to_be_in_set.py", line 12, in <module>
    from great_expectations.expectations.expectation import (
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/expectation.py", line 2350, in <module>
    class BatchExpectation(Expectation, ABC):
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/expectation.py", line 287, in __new__
    newclass._register_renderer_functions()
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/expectation.py", line 369, in _register_renderer_functions
    attr_obj: Callable = getattr(cls, candidate_renderer_fn_name)
AttributeError: __provides__. Did you mean: '__providedBy__'?
VMRuiz commented 4 weeks ago

Importing Certificate directly from its internal package seems to work:

(great_expectations) ➜  scrapy git:(master) ✗ python
Python 3.10.14 (main, Mar 19 2024, 21:46:16) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from twisted.internet._sslverify import Certificate
>>> import great_expectations

So it must be related to twisted.internet.ssl init code

VMRuiz commented 4 weeks ago

I kept digging in Twisted code and the culprit seems to be the class BaseConnector(ABC) class at https://github.com/twisted/twisted/blob/1c80aad4c8fd2d0142433476bd5f6df5c511b4ba/src/twisted/internet/base.py#L1224

For some reason, the implementer decorator adds __provides__ to both BaseConnector and ABC classes:


>>> from zope.interface import classImplements, implementer
>>> from twisted.internet.interfaces import IConnector
>>> from abc import ABC
>>> @implementer(IConnector)
... class Test2(ABC):
...    pass
>>> import great_expectations
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/__init__.py", line 32, in <module>
    register_core_expectations()
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/registry.py", line 187, in register_core_expectations
    from great_expectations.expectations import core  # noqa: F401
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/core/__init__.py", line 1, in <module>
    from .expect_column_distinct_values_to_be_in_set import (
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/core/expect_column_distinct_values_to_be_in_set.py", line 12, in <module>
    from great_expectations.expectations.expectation import (
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/expectation.py", line 2350, in <module>
    class BatchExpectation(Expectation, ABC):
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/expectation.py", line 287, in __new__
    newclass._register_renderer_functions()
  File "/Users/zyte/workspace/opensource/scrapy/.tox/great_expectations/lib/python3.10/site-packages/great_expectations/expectations/expectation.py", line 369, in _register_renderer_functions
    attr_obj: Callable = getattr(cls, candidate_renderer_fn_name)
AttributeError: __provides__. Did you mean: '__providedBy__'?```
culpgrant commented 4 weeks ago

@VMRuiz Thank you for looking into this! Do you think I should create an issue with zope?

VMRuiz commented 3 weeks ago

To be honest, I don't know if this is it a problem with Zope or a bad implementation by Twisted lib. @wRAR What do you think?

As a workaround for Scrapy, maybe could import from twisted.internet._sslverify import Certificate in the meantime to avoid these side effects? There is some risk of this breaking in the future but I wouldn't expect great changes from Twisted at this point.

wRAR commented 3 weeks ago

My first thought was also "I don't know if this is it a problem with Zope or a bad implementation by Twisted lib", as I'm not familiar with the zope.interface internals.

GeorgeA92 commented 3 weeks ago

@culpgrant

Steps to Reproduce This does work:

import great_expectations import scrapy

If this works. What prevent You to just use this import order in Your task?

GeorgeA92 commented 3 weeks ago

Counthing https://github.com/great-expectations/great_expectations/issues/9698#issuecomment-2051252373 I think that this issue is not related to scrapy and it's root-cause is 100% in GreatExpectations codebase (it can be solved by adding simple try except block around line tha gave AttributeError).

Rishika70 commented 1 week ago

try...except Block: Wrap the import statements in a try...except block to gracefully handle the import error and provide informative messages: try: import great_expectations import scrapy except ImportError as e: print(f"Error importing libraries: {e}")

If you don't need Great Expectations functionalities throughout your script, consider delaying the import until the point of use with a function

import scrapy

def use_great_expectations(): from great_expectations import some_great_expectations_function # Import only when needed

print("Great Expectations used")

use_great_expectations()

Make sure you're using a clean virtual environment to avoid conflicts with other installed packages. Reinstall Scrapy and Great Expectations in a fresh environment to see if it resolves the issue. Check for version compatibility between Scrapy and Great Expectations

culpgrant commented 1 week ago

This was determined to be a great expectations - issue.

Rishika70 commented 1 week ago

the real question is why is the import modifying a dependency instead of making a duplicate and modifying the copy if thats whats going on i think its bad practice.