great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

lower case column name leads to MetricResolutionError on Snowflake #8972

Open shamanoor opened 8 months ago

shamanoor commented 8 months ago

Describe the bug Running a column expectation on a table with lower case column names returns a MetricResolutionError. It seems to fail to wrap the passed lower case column name with quotation marks . I read previous issues that have been closed as this problem should have been solved. Yet I still run into it and can't find a workaround.

To Reproduce

datasources:
  my_snowflake_datasource:
    class_name: Datasource
    module_name: great_expectations.datasource
    execution_engine:
      class_name: SqlAlchemyExecutionEngine
      module_name: great_expectations.execution_engine
      connection_string: snowflake://*:*@*/*/*?warehouse=*&role=*&application=great_expectations_oss
      create_temp_table: false
    data_connectors:
      default_inferred_data_connector_name:
        class_name: InferredAssetSqlDataConnector
        module_name: great_expectations.datasource.data_connector
        include_schema_name: true

I add the datasource to my context. I create the batch request as below:

batch_request = BatchRequest(
    datasource_name="my_snowflake_datasource",
    data_connector_name="default_inferred_data_connector_name",
    data_asset_name="schema_name.table_name"
)

And get a validator from the context with the above batch request. I created the below validation which is giving issues:

validator.expect_column_values_to_be_between("inname", min_value=1.0, max_value=2.0)
validator.save_expectation_suite(discard_failed_expectations=False)

checkpoint = context.add_or_update_checkpoint(
    name="blablabla",
    validator=validator,
    runtime_configuration={
        "result_format": {
            "result_format": "COMPLETE",
        },
    },
)

I get the following error when running the expectation:

great_expectations.exceptions.exceptions.MetricResolutionError: (snowflake.connector.errors.ProgrammingError) 000904 (42000): SQL compilation error: error line 1 at position 7
invalid identifier 'INNAME'
[SQL: SELECT inname AS unexpected_values 
FROM (SELECT * 
FROM "schema_name"."table_name" 
WHERE true) AS anon_1 
WHERE inname IS NOT NULL AND NOT (inname >= %(param_1)s AND inname <= %(param_2)s)
 LIMIT %(param_3)s]
[parameters: {'param_1': 1.0, 'param_2': 2.0, 'param_3': 20}]

Expected behavior I was expecting the query to run without issue, with the column name "inname" in quotation marks in the query. It also seems odd I get the invalid identifier error with the column name in capital letters ('INNAME' instead of 'inname), but that doesn't seem to be the issue here.

Environment (please complete the following information):

Kilo59 commented 8 months ago

@shamanoor You appear to be using an old datasource class. See below for using our new "Fluent" style Datasources, click on the snowflake section for specific instructions. https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/fluent/database/connect_sql_source_data.

Did you use quotes when creating the column in Snowflake?