great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

Glue2.0 job with great expectations version > 0.16.7 failing. #7839

Open da-sbarde opened 1 year ago

da-sbarde commented 1 year ago

We have a glue2.0 script which uses Great Expectations module. The jobs are failing for the version > 0.16.7 with error- ImportError: cannot import name 'DEFAULTCIPHERS' from 'urllib3.util.ssl' (/home/spark/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py)

Everything works fine for Great Expectation version = 0.16.7 I am using spark_s3 as datasource, S3 as the backend_store.

tjholsman commented 1 year ago

Hi @da-sbarde

Could you share your config file (great_expectations.yml), the code block that generates this error, and the full stack trace so we can investigate this issue?

da-sbarde commented 1 year ago

Sure, Here is the full traceback-

2023-05-10 13:14:56,378 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(77)): Error from Python:Traceback (most recent call last): File "/tmp/transformer.py", line 3, in import boto3 File "/home/spark/.local/lib/python3.10/site-packages/boto3/init.py", line 17, in from boto3.session import Session File "/home/spark/.local/lib/python3.10/site-packages/boto3/session.py", line 17, in import botocore.session File "/home/spark/.local/lib/python3.10/site-packages/botocore/session.py", line 26, in import botocore.client File "/home/spark/.local/lib/python3.10/site-packages/botocore/client.py", line 15, in from botocore import waiter, xform_name File "/home/spark/.local/lib/python3.10/site-packages/botocore/waiter.py", line 18, in from botocore.docs.docstring import WaiterDocstring File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/init.py", line 15, in from botocore.docs.service import ServiceDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/service.py", line 14, in from botocore.docs.client import ClientDocumenter, ClientExceptionsDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/client.py", line 14, in from botocore.docs.example import ResponseExampleDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/example.py", line 13, in from botocore.docs.shape import ShapeDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/shape.py", line 19, in from botocore.utils import is_json_valueheader File "/home/spark/.local/lib/python3.10/site-packages/botocore/utils.py", line 34, in import botocore.httpsession File "/home/spark/.local/lib/python3.10/site-packages/botocore/httpsession.py", line 21, in from urllib3.util.ssl import ( ImportError: cannot import name 'DEFAULTCIPHERS' from 'urllib3.util.ssl' (/home/spark/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py)

And this is the yaml config string I am using-

f""" config_version: 3.0 datasources: spark_s3: module_name: great_expectations.datasource class_name: Datasource execution_engine: module_name: great_expectations.execution_engine class_name: SparkDFExecutionEngine data_connectors: default_inferred_data_connector_name: class_name: InferredAssetS3DataConnector bucket: {data_connector_bucket} prefix: {data_connector_prefix} default_regex: pattern: (.*) group_names:

tjholsman commented 1 year ago

Thanks @da-sbarde! It seems there is a compatibility issue between boto3 and urllib3. Are you able to pin urllib3<2 in your environment?

More here: https://github.com/boto/botocore/issues/2926#issuecomment-1538900780

da-sbarde commented 1 year ago

Hi @tjholsman, since we are using Great Expectations along with AWS glue, I don't think we can use the particular version of urllib3 as this would cause issues with other implementations.

da-sbarde commented 1 year ago

Tried to test with latest version of GE, I am still facing same issue. ImportError: cannot import name 'DEFAULTCIPHERS' from 'urllib3.util.ssl' (/home/spark/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py)

jayesh-patel-ig commented 1 year ago

@tjholsman Do you or anyone has any update on this issue?

lodsdevera commented 1 year ago

@jayesh-patel-ig have you tried adding the key-value under the job parameters section of job details tab in aws glue console, like this: image

the addition of urllib3<2 should fix the above issue

da-sbarde commented 1 year ago

@lodsdevera tried above suggestion it results in below error- _RefResolutionError: 'bytes' object has no attribute 'timeout'.

This is the complete traceback.

File "/home/spark/.local/lib/python3.10/site-packages/jsonschema/validators.py", line 1087, in resolve_from_url document = self.store[url] File "/home/spark/.local/lib/python3.10/site-packages/jsonschema/_utils.py", line 20, in getitem return self.store[self.normalize(uri)] KeyError: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/spark/.local/lib/python3.10/site-packages/jsonschema/validators.py", line 1090, in resolve_from_url document = self.resolve_remote(url) File "/home/spark/.local/lib/python3.10/site-packages/jsonschema/validators.py", line 1194, in resolve_remote with urlopen(uri) as url: File "/usr/local/lib/python3.10/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout) File "/usr/local/lib/python3.10/urllib/request.py", line 509, in open req.timeout = timeout AttributeError: 'bytes' object has no attribute 'timeout'

lodsdevera commented 1 year ago

@da-sbarde ohh, I'm encountering this also last night after replying here. Before last night, our glue job with GX is working fine so there must be something else now. Haven't tried running it today. I also tried fixing the GX version to < 0.17 in that additional-python-modules parameter but still the same error

Not sure who to tag here now regarding this

lodsdevera commented 1 year ago

@da-sbarde retried running the same glue job without any changes and the error was not encountered now

da-sbarde commented 12 months ago

@lodsdevera , the glue Job without specifying the great_expectations version worked for a while but it is giving same error now. It is working with version = 0.16.7 though.

ImportError: cannot import name 'DEFAULTCIPHERS' from 'urllib3.util.ssl'.

Below is the full traceback.

File "/home/spark/.local/lib/python3.10/site-packages/boto3/init.py", line 17, in from boto3.session import Session File "/home/spark/.local/lib/python3.10/site-packages/boto3/session.py", line 17, in import botocore.session File "/home/spark/.local/lib/python3.10/site-packages/botocore/session.py", line 26, in import botocore.client File "/home/spark/.local/lib/python3.10/site-packages/botocore/client.py", line 15, in from botocore import waiter, xform_name File "/home/spark/.local/lib/python3.10/site-packages/botocore/waiter.py", line 18, in from botocore.docs.docstring import WaiterDocstring File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/init.py", line 15, in from botocore.docs.service import ServiceDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/service.py", line 14, in from botocore.docs.client import ClientDocumenter, ClientExceptionsDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/client.py", line 14, in from botocore.docs.example import ResponseExampleDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/example.py", line 13, in from botocore.docs.shape import ShapeDocumenter File "/home/spark/.local/lib/python3.10/site-packages/botocore/docs/shape.py", line 19, in from botocore.utils import is_json_valueheader File "/home/spark/.local/lib/python3.10/site-packages/botocore/utils.py", line 34, in import botocore.httpsession File "/home/spark/.local/lib/python3.10/site-packages/botocore/httpsession.py", line 21, in from urllib3.util.ssl import ( ImportError: cannot import name 'DEFAULTCIPHERS' from 'urllib3.util.ssl' (/home/spark/.local/lib/python3.10/site-packages/urllib3/util/ssl_.py)

da-sbarde commented 11 months ago

@lodsdevera any luck with the issue?