Closed karthigai-selvan closed 1 year ago
Acknowledged, and thank you for sharing this information with us and raising the issue! We’ve added this to our internal backlog to review this behavior.
Have checked GE integration with AWS MWAA version 2.5.1 with airflow provided GreatExpectationsOperator still facing the same issue as GE is trying to open few files in write mode but AWS MWAA is read only.
We've observed this issue on MWAA when we upgraded GX from 0.15 to 0.17 Here is a copy of our trace:
Traceback (most recent call last):
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations_provider/operators/great_expectations.py", line 557, in execute
self.data_context = ge.data_context.DataContext(context_root_dir=self.data_context_root_dir)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/data_context/data_context/data_context.py", line 170, in DataContext
context = BaseDataContext(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/data_context/data_context/base_data_context.py", line 187, in BaseDataContext
return get_context(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/util.py", line 1917, in get_context
file_context = _get_file_context(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/util.py", line 2047, in _get_file_context
return FileDataContext(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/data_context/data_context/file_data_context.py", line 61, in __init__
self._scaffold_project()
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/data_context/data_context/file_data_context.py", line 93, in _scaffold_project
self._scaffold(
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/data_context/data_context/serializable_data_context.py", line 197, in _scaffold
cls._scaffold_directories(gx_dir)
File "/usr/local/airflow/.local/lib/python3.10/site-packages/great_expectations/data_context/data_context/serializable_data_context.py", line 270, in _scaffold_directories
with open(os.path.join(base_dir, ".gitignore"), "w") as f: # noqa: PTH118
OSError: [Errno 30] Read-only file system: '/usr/local/airflow/dags/sfg/great_expectations/.gitignore'
Maybe the operator could create the context while omitting the scaffolding? For example:
context = DataContext(context_root_dir=foo, omit_dir_scaffolding=True)
Or maybe the scaffolding process could handle a read-only filesystem more graceful?
This is blocking us from upgrading to a more recent version of GX.
Describe the bug We are using GE for validating few reports and trying to move those validations as part of existing data pipeline. We are using AWS MWAA 2.0.2 hence it is not possible to use GreatExpectationOperator as it requires Airflow 2.1.0+. Hence using PythonVirtualenvOperator with great-expectations and required python modules as requirements to it. We are getting an error while initiating the data_context by the following command.
data_context: FileDataContext = get_context(context_root_dir=context_root_dir)
The context_root_dir is pointing to the great_expectations project directory in /usr/local/airflow directory. We are getting the below error.
File "/tmp/venvubafuq2v/lib/python3.7/site-packages/great_expectations/data_context/data_context/serializable_data_context.py", line 268, in _scaffold_directories with open(os.path.join(base_dir, ".gitignore"), "w") as f: # noqa: PTH118 OSError: [Errno 30] Read-only file system: '/usr/local/airflow/great_expectations/.gitignore'
I verified the great_expectations directory we already have the .gitignore file with uncommitted/ added to it. How can we ignore this step if the .gitignore file already exists in the context_root_dir? Since the MWAA files are mostly read-only how we can integrate GE with it?
To Reproduce Please include your great_expectations.yml config, the code you’re executing that causes the issue, and the full stack trace of any error(s).
Expected behavior While initiating the great_expectations data_context if the .gitignore file already presents then it should not try to write it again.
Environment (please complete the following information):
Additional context Add any other context about the problem here.