great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.88k stars 1.53k forks source link

expect_column_values_to_of_type does not work #9626

Closed sfan0704 closed 1 month ago

sfan0704 commented 6 months ago

Opening an issue similar to this https://github.com/great-expectations/great_expectations/issues/5565

I'm also running into this problem while using a Snowflake datasource to check the type of the columns.

Here are the specs,

I was debugging through this and found that the metrics object only has the key name in the dict Screenshot 2024-03-14 at 19 42 13

stack trace Screenshot 2024-03-14 at 19 38 59

_Originally posted by @sfan0704 in https://github.com/great-expectations/great_expectations/issues/5565#issuecomment-1998654976_

austiezr commented 6 months ago

Hey @sfan0704 -- I'm so far unable to reproduce this. Are you able to share your configuration for this expectation as well?

sfan0704 commented 6 months ago

Hey @austiezr thanks for responding to this. My expectation suite looks like this. Not sure what else I can provide to make this clearer.

    {
      "expectation_type": "expect_column_values_to_be_of_type",
      "kwargs": {
        "column": "reference_currency_price",
        "type_": "FLOAT"
      },
      "meta": {}
    },
austiezr commented 6 months ago

How did you initially create this datasource? Is this error arising after running a checkpoint?

sfan0704 commented 6 months ago

I create the datasource like this,

self.data_context.sources.add_snowflake(name="source_name",  connection_string="STRING")

And correct, the exception was thrown after we run the checkpoint.

NehaNRane commented 6 months ago

Hi Team,

Any update on this issue. I had worked with this expectation and it was working on till March 14, 2024. Today when I am running the same validation with same arguments, its giving me error as below:

Calculating Metrics: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.16it/s] 'type'

sfan0704 commented 6 months ago

Can you share your python dependency versions?

NehaNRane commented 6 months ago

Can you share your python dependency versions?

I am using python==3.9.6 great_expectatinos==0.18.7 SQLAlchemy==1.4.51 snowflake-connector-python==3.6.0 snowflake-sqlalchemy==1.5.1

NehaNRane commented 6 months ago

Hi Team,

Any update on this issue. Any eta for the issue.?

rachhouse commented 6 months ago

Hey @NehaNRane @sfan0704, Austin handed this issue over to me - I'm going to investigate and try to reproduce the error you're getting.

rachhouse commented 6 months ago

@sfan0704 @NehaNRane Ok, I've tested in two different virtual environments matching your dependencies and have not been able to replicate the error - the expect_column_values_to_be_of_type Expectation runs without issue on a Snowflake Table Asset.

As a next troubleshooting step, could you please install great-expectations[snowflake] (using the version currently giving you the error) in a fresh virtual environment and retry your code?

If you still get the same error after a fresh install, we'll take a closer look at your code and Snowflake data next.

NehaNRane commented 6 months ago

@rachhouse, I tried again by creating fresh virtual environment but I am getting the same error. python==3.9.6 SQLAlchemy==1.4.52 great-expectations==0.18.12 snowflake-connector-python==3.8.1 snowflake-sqlalchemy==1.5.1

Error: Calculating Metrics: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.27it/s] Traceback (most recent call last): File "c:\Users\10700792\OneDrive - LTIMindtree\Backup - 01-03-2023\Backup on 1st Mar\Documents\Extras\poc\all_expectation_testing.py", line 154, in validation_result = validator.expect_column_values_to_be_oftype(column="PaymentMethod", type= "VARCHAR") File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 590, in inst_expectation raise err File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 553, in inst_expectation validation_result = expectation.validate( File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\expectation.py", line 1311, in validate ] = validator.graph_validate( File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 1089, in graph_validate raise err File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\validator\validator.py", line 1073, in graph_validate result = configuration.metrics_validate( File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\core\expectation_configuration.py", line 1494, in metrics_validate return expectation_impl(self).metrics_validate( File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\expectation.py", line 1100, in metrics_validate ] = self._validate( File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py", line 496, in _validate
actual_column_type = [ File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py", line 497, in type_dict["type"] File "C:\Users\10700792\Desktop\FDC\venv_gx\lib\site-packages\great_expectations\expectations\metrics\util.py", line 367, in getitem item = self.data[key] KeyError: 'type'

rachhouse commented 6 months ago

Hi @NehaNRane, thanks for retrying with a fresh install and including the stacktrace. Can you please also share your GX workflow code from all_expectation_testing.py that generates this error?

rachhouse commented 6 months ago

I've tested this Expectation (expect_column_values_to_of_type) against a collection differently cased Snowflake table columns, but still have not reproduced an error.

At this point, I am suspicious that this behavior might be a Windows issue. I see you are using Windows from your stacktrace, @NehaNRane, and when this same behavior was reported in #5565, it also occurred for a Windows user.

@sfan0704, are you running your code on Windows or WSL?

sfan0704 commented 6 months ago

I'm running on a Mac

NehaNRane commented 5 months ago

Hi @rachhouse Below code is in all_expectation_testing.py. Its a simple code.

import great_expectations as gx
from great_expectations.core.batch import RuntimeBatchRequest
from great_expectations.core.expectation_configuration import ExpectationConfiguration
import yaml
from configs.datasource_config import snowflake_datasource

context = gx.get_context()
snowflake_datasource = {
    "name": "my_snowflake_datasource",
    "class_name": "Datasource",
    "execution_engine": {
        "class_name": "SqlAlchemyExecutionEngine",
        "connection_string": connection_string
        "create_temp_table": False,
    },
    "data_connectors": {
        "default_runtime_data_connector_name": {
            "class_name": "RuntimeDataConnector",
            "batch_identifiers": ["default_identifier_name"],
        },
    },
}

db_datasource = context.add_or_update_datasource(**snowflake_datasource)
table_name = "PAYMENT"
schema_name = "STRIPE"
db_name = "RAW_DATA"

batch_request = RuntimeBatchRequest(
    datasource_name="my_snowflake_datasource",
    data_connector_name="default_runtime_data_connector_name",
    data_asset_name=table_name,
    runtime_parameters={
        "query": f'SELECT * FROM "{db_name}"."{schema_name}"."{table_name}"'
    },
    batch_identifiers={"default_identifier_name": "default_identifier"}
)

validator = context.get_validator(batch_request=batch_request)
validation_result = validator.expect_column_values_to_be_of_type(column="status", type_= "VARCHAR")
print(validation_result)
rachhouse commented 5 months ago

@NehaNRane, looks like the issue is that you're using outdated code - the use of RuntimeBatchRequest is deprecated. Can you try updating your code to use Fluent Data Sources? Here is a simple example of how you can connect to a Snowflake table and run the expect_column_values_to_of_type Expectation:

import great_expectations as gx

DATA_SOURCE_NAME = "<your data source name>"
DATA_ASSET_NAME = "<your data asset name>"
EXPECTATION_SUITE_NAME = "<your expectation suite name>"
CHECKPOINT_NAME = "<your checkpoint name>"

context = gx.get_context()

data_source = context.sources.add_snowflake(
    name=DATA_SOURCE_NAME,
    connection_string="<snowflake connection string>"
)

data_asset = data_source.add_table_asset(
    name=DATA_ASSET_NAME,
    table_name="<snowflake table name>"
)

batch_request = data_asset.build_batch_request()

context.add_or_update_expectation_suite(expectation_suite_name=EXPECTATION_SUITE_NAME)

validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=EXPECTATION_SUITE_NAME,
)

validator.expect_column_values_to_be_of_type("<snowflake table column name>", type_="FLOAT")
validator.expect_column_values_to_be_of_type("<snowflake table column name>", type_="VARCHAR")

validator.save_expectation_suite(discard_failed_expectations=False)

checkpoint = context.add_or_update_checkpoint(
    name=CHECKPOINT_NAME,
    validator=validator,
)

checkpoint_result = checkpoint.run()

@sfan0704 Thanks for confirming your environment, can share your full GX workflow code that generates the error?

NehaNRane commented 5 months ago

Hi @rachhouse,

I observed that this issue occurs only when data_asset is created by query asset. We have a table_name in mixed case. To handle this, we are using query asset. In our use case, table_name and column_name can be in any case. So, how to go about it in such scenario.?

rachhouse commented 5 months ago

Thanks @NehaNRane, those details are very helpful - I'm now able to consistently generate an error when using a Snowflake table with a mixed case name, a QueryAsset, and expect_column_values_to_be_of_type. I'll dig into this with Engineering next week.

sfan0704 commented 5 months ago

Sorry for the delayed response, we're also doing exactly what @rachhouse is doing. Looking forward to the potential solution here. Thanks.

rachhouse commented 5 months ago

Hi @NehaNRane @sfan0704, Engineering and I dug further into this error, and the problem is caused by Snowflake tables that have a MixedCaseName name identifier that needs to be quoted (for example, DATABASE.SCHEMA."MyTable"). Fixing the underlying issue is ticketed and on our backlog, but we don't have a timeline yet for when we'll be able to fix it.

The current workaround is to either: a) create a view of your MixedCaseTableName table that uses an UPPERCASE_NAME or b) rename your MixedCaseTableName table to use an UPPERCASE_NAME

and use a GX Table Asset with your renamed table or created view. From there, you should be able to run expect_column_values_to_be_of_type on your asset and have it behave as expected.

In testing, I created a view, UCCTABLENAME_VIEW from a table with a quoted MixedCaseName:

create view UCCTABLENAME_VIEW as select * from "UccTableName";

used the view to create a Table Asset:

data_asset = data_source.add_table_asset(
    name="snowflake-table",
    table_name="UCCTABLENAME_VIEW"
)

and was able to run expect_column_values_to_be_of_type using the subsequent validator. In testing, MixedCaseColumnNames worked fine for the Expectation - as long as the table/view name was uppercase.

validator.expect_column_values_to_be_of_type("ColumnOne", type_="FLOAT")
validator.expect_column_values_to_be_of_type("ColumnTwo", type_="VARCHAR")
molliemarie commented 1 month ago

Hello @sfan0704. With the launch of Great Expectations Core (GX 1.0), we are closing old issues posted regarding previous versions. Moving forward, we will focus our resources on supporting and improving GX Core (version 1.0 and beyond). If you find that an issue you previously reported still exists in GX Core, we encourage you to resubmit it against the new version. With more resources dedicated to community support, we aim to tackle new issues swiftly. For specific details on what is GX-supported vs community-supported, you can reference our integration and support policy.

To get started on your transition to GX Core, check out the GX Core quickstart (click “Full example code” tab to see a code example).

You can also join our upcoming community meeting on August 28th at 9am PT (noon ET / 4pm UTC) for a comprehensive rundown of everything GX Core, plus Q&A as time permits. Go to https://greatexpectations.io/meetup and click “follow calendar” to follow the GX community calendar.

Thank you for being part of the GX community and thank you for submitting this issue. We're excited about this new chapter and look forward to your feedback on GX Core. 🤗