great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

Great Expectations not working as expected with SQLAlchemy 2.0 against MS SQL database #8709

Open robinstauntoncollins opened 10 months ago

robinstauntoncollins commented 10 months ago

Describe the bug

Attempting to run checks on local SQL Server results in:

Traceback (most recent call last):\n  File "<env>\\lib\\site-packages\\sqlalchemy\\engine\\base.py", line 1410, in execute\n    meth = statement._execute_on_connection\nAttributeError: \'str\' object has no attribute \'_execute_on_connection\'

To Reproduce

Please include your great_expectations.yml config, the code you’re executing that causes the issue, and the full stack trace of any error(s). great_expectations.yml

# Welcome to Great Expectations! Always know what to expect from your data.
#
# Here you can define datasources, batch kwargs generators, integrations and
# more. This file is intended to be committed to your repo. For help with
# configuration please:
#   - Read our docs: https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview/#2-configure-your-datasource
#   - Join our slack channel: http://greatexpectations.io/slack

# config_version refers to the syntactic version of this config file, and is used in maintaining backwards compatibility
# It is auto-generated and usually does not need to be changed.
config_version: 3.0

# Datasources tell Great Expectations where your data lives and how to get it.
# You can use the CLI command `great_expectations datasource new` to help you
# add a new datasource. Read more at https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview
datasources: {}

# This config file supports variable substitution which enables: 1) keeping
# secrets out of source control & 2) environment-based configuration changes
# such as staging vs prod.
#
# When GX encounters substitution syntax (like `my_key: ${my_value}` or
# `my_key: $my_value`) in the great_expectations.yml file, it will attempt
# to replace the value of `my_key` with the value from an environment
# variable `my_value` or a corresponding key read from this config file,
# which is defined through the `config_variables_file_path`.
# Environment variables take precedence over variables defined here.
#
# Substitution values defined here can be a simple (non-nested) value,
# nested value such as a dictionary, or an environment variable (i.e. ${ENV_VAR})
#
#
# https://docs.greatexpectations.io/docs/guides/setup/configuring_data_contexts/how_to_configure_credentials

config_variables_file_path: uncommitted/config_variables.yml

# The plugins_directory will be added to your python path for custom modules
# used to override and extend Great Expectations.
plugins_directory: plugins/

stores:
# Stores are configurable places to store things like Expectations, Validations
# Data Docs, and more. These are for advanced users only - most users can simply
# leave this section alone.
#
# Three stores are required: expectations, validations, and
# evaluation_parameters, and must exist with a valid store entry. Additional
# stores can be configured for uses such as data_docs, etc.
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/

  validations_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/

  evaluation_parameter_store:
    class_name: EvaluationParameterStore
  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/

  profiler_store:
    class_name: ProfilerStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: profilers/

expectations_store_name: expectations_store
validations_store_name: validations_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
  # Data Docs make it simple to visualize data quality in your project. These
  # include Expectations, Validations & Profiles. The are built for all
  # Datasources from JSON artifacts in the local repo including validations &
  # profiles from the uncommitted directory. Read more at https://docs.greatexpectations.io/docs/terms/data_docs
  local_site:
    class_name: SiteBuilder
    show_how_to_buttons: true
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

anonymous_usage_statistics:
  data_context_id: 4696bd72-6135-4c14-ac22-cbeb6ef6d916
  enabled: true
fluent_datasources:
  mds_local_docker_db:
    type: sql
    assets:
      my_table_asset:
        type: table
        order_by: []
        batch_metadata: {}
        table_name: pmds_metrics
        schema_name:
    connection_string: mssql+pyodbc://sa:<password>@localhost:1433/mds?driver=ODBC
      Driver 17 for SQL Server&charset=utf&autocommit=true
notebooks:
include_rendered_content:
  globally: false
  expectation_suite: false
  expectation_validation_result: false

Full Stacktrace:

Exceptions
{('table.head', 'batch_id=mds_local_docker_db-my_table_asset', '04166707abe073177c1dd922d3584468'): {'metric_configuration': {
  "metric_name": "table.head",
  "metric_domain_kwargs": {
    "batch_id": "mds_local_docker_db-my_table_asset"
  },
  "metric_domain_kwargs_id": "batch_id=mds_local_docker_db-my_table_asset",
  "metric_value_kwargs": {
    "n_rows": 5,
    "fetch_all": false
  },
  "metric_value_kwargs_id": "04166707abe073177c1dd922d3584468",
  "id": [
    "table.head",
    "batch_id=mds_local_docker_db-my_table_asset",
    "04166707abe073177c1dd922d3584468"
  ]
}, 'num_failures': 3, 'exception_info': {{'exception_traceback': 'Traceback (most recent call last):\n  File "<env>\\lib\\site-packages\\sqlalchemy\\engine\\base.py", line 1410, in execute\n    meth = statement._execute_on_connection\nAttributeError: \'str\' object has no attribute \'_execute_on_connection\'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File "<env>\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 548, in _process_direct_and_bundled_metric_computation_configurations\n    ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable\n  File "<env>\\lib\\site-packages\\great_expectations\\expectations\\metrics\\metric_provider.py", line 50, in inner_func\n    return metric_fn(*args, **kwargs)\n  File "<cenv>\\lib\\site-packages\\great_expectations\\expectations\\metrics\\table_metrics\\table_head.py", line 109, in _sqlalchemy\n    df_chunk_iterator = read_sql_table_as_df(\n  File "<env>\\lib\\site-packages\\great_expectations\\compatibility\\sqlalchemy_compatibility_wrappers.py", line 55, in read_sql_table_as_df\n    return _read_sql_table_as_df(\n  File "<env>\\lib\\site-packages\\great_expectations\\compatibility\\sqlalchemy_compatibility_wrappers.py", line 133, in _read_sql_table_as_df\n    return pd.read_sql_query(\n  File "<env>\\lib\\site-packages\\pandas\\io\\sql.py", line 397, in read_sql_query\n    return pandas_sql.read_query(\n  File "<env>\\lib\\site-packages\\pandas\\io\\sql.py", line 1560, in read_query\n    result = self.execute(*args)\n  File "<env>\\lib\\site-packages\\pandas\\io\\sql.py", line 1405, in execute\n    return self.connectable.execution_options().execute(*args, **kwargs)\n  File "<env>\\lib\\site-packages\\sqlalchemy\\engine\\base.py", line 1412, in execute\n    raise exc.ObjectNotExecutableError(statement) from err\nsqlalchemy.exc.ObjectNotExecutableError: Not an executable object: \'SELECT * FROM #gx_temp_427a0638\'\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File "<env>\\lib\\site-packages\\great_expectations\\validator\\validation_graph.py", line 285, in _resolve\n    self._execution_engine.resolve_metrics(\n  File "<env>\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 283, in resolve_metrics\n    return self._process_direct_and_bundled_metric_computation_configurations(\n  
File "<env>\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 552, in _process_direct_and_bundled_metric_computation_configurations\n    raise gx_exceptions.MetricResolutionError(\ngreat_expectations.exceptions.exceptions.MetricResolutionError: Not an executable object: \'SELECT * FROM #gx_temp_427a0638\'\n', 'exception_message': "Not an executable object: 'SELECT * FROM #gx_temp_427a0638'", 'raised_exception': True}}}}
occurred while resolving metrics.
Traceback (most recent call last):
  File "ge_test.py", line 90, in <module>
    main(password)
  File "ge_test.py", line 36, in main
    print(validator.head())
  File "<env>\lib\site-packages\great_expectations\validator\validator.py", line 411, in head
    return self._metrics_calculator.head(
  File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 104, in head
    df: pd.DataFrame = self.get_metric(
  File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 122, in get_metric
    return self.get_metrics(
  File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 142, in get_metrics
    return {
  File "<env>\lib\site-packages\great_expectations\validator\metrics_calculator.py", line 143, in <dictcomp>
    metric_configuration.metric_name: resolved_metrics[metric_configuration.id]
KeyError: ('table.head', 'batch_id=mds_local_docker_db-my_table_asset', '04166707abe073177c1dd922d3584468')

code

def main(password: str):
    context = gx.get_context()

    print(f"Data sources: {context.list_datasources()}")

    mssql_datasource = context.sources.add_or_update_sql(
        name="mds_local_docker_db",
        connection_string=f"mssql+pyodbc://sa:{password}@localhost:1433/mds?driver=ODBC Driver 17 for SQL Server&charset=utf&autocommit=true",
    )
    print("DataSource created")

    mssql_datasource.add_table_asset(name="my_table_asset", table_name="my_table")
    print("TableAsset created")

    batch_request = mssql_datasource.get_asset("my_table_asset").build_batch_request()
    print("BatchRequest created")

    expectation_suite_name = "MDS_MY_TABLE_EXPECTATIONS"
    context.add_or_update_expectation_suite(
        expectation_suite_name=expectation_suite_name
    )
    assert "MDS_MY_TABLE_EXPECTATIONS" in context.list_expectation_suite_names()
    print("ExpectationSuite created")

    validator = context.get_validator(
        batch_request=batch_request,
        expectation_suite_name=expectation_suite_name,
    )
    print("Validator created")

    print(validator.head())

    # validator.expect_column_values_to_match_strftime_format('date', '%Y-%m-%d %H:%M:%S.')
    validator.expect_column_values_to_not_be_null(column="date")
    validator.expect_column_distinct_values_to_be_in_set(
        "business_line",
        set(
            [
                "BMC",
                "CB",
                "CDE",
                "NVT",
                "PBBE",
                "PBDE",
                "PBFR",
                "PBNL",
                "RT",
            ]
        ),
    )
    validator.save_expectation_suite(discard_failed_expectations=False)

    my_checkpoint_name = "my_sql_checkpoint"

    checkpoint = Checkpoint(
        name=my_checkpoint_name,
        run_name_template="%Y%m%d-%H%M%S-my-run-name-template",
        data_context=context,
        batch_request=batch_request,
        expectation_suite_name=expectation_suite_name,
        action_list=[
            {
                "name": "store_validation_result",
                "action": {"class_name": "StoreValidationResultAction"},
            },
            {
                "name": "update_data_docs",
                "action": {"class_name": "UpdateDataDocsAction"},
            },
        ],
    )
    context.add_or_update_checkpoint(checkpoint=checkpoint)
    checkpoint_result = checkpoint.run()
    context.open_data_docs()

Expected behavior I expect Great Expectations to successfully connect to my locally running docker image with SQL Server using the installed SQLAlchemy 2.0.7 and run the quality checks

Environment (please complete the following information):

Additional context Add any other context about the problem here.

r34ctor commented 10 months ago

Hey @robinstauntoncollins, thanks for raising this issue. We've captured this for review.

Long-Nguyen-Paidy commented 8 months ago

We are using GX with AWS Athena, and facing this issue, too

LuanAraldi commented 2 months ago

I'm facing the same issue using GX version 0.18.13 with AWS Athena and SQLAlchemy 2.0.30