Closed jarcpro97 closed 1 month ago
hi there, do you have other expectations in this same file that are passing without error?
Hi, I have those other expectations and they are working well: gx.expectations.ExpectTableRowCountToEqual(), gx.expectations.ExpectColumnMinToBeBetween(), gx.expectations.ExpectColumnMaxToBeBetween(), gx.expectations.ExpectCompoundColumnsToBeUnique()
Is only with this column type expectations
Doing other tests we found that the error does not happen with add_table_asset, only with add_query_asset. Example: asset_name = "query_asset_target" asset_query = f'SELECT * FROM {prefix_athena}{table_name} ORDER BY {column_order_by} ASC' query_data_asset_target = athena_source.add_query_asset(query=asset_query, name=asset_name) batch_definition = query_data_asset_target.add_batch_definition_whole_table( name="FULL_TABLE_TARGET" ) batch = batch_definition.get_batch()
Hi there,
Thank you for sharing this additional information.
The fact that the issue doesn't occur with add_table_asset
but does with query-based assets suggests that the error you’re facing could be related to the limitations of Athena support in Great Expectations. Due to these limitations, it's reasonable to expect that column type detection or certain expectation configurations may not work as seamlessly. Additionally, Athena uses a slightly different SQL dialect, and some data types may not be recognized or mapped correctly within Great Expectations.
As a workaround, you might try explicitly casting types in your query to help force Athena to return type information in a way that Great Expectations can recognize. For example:
SELECT CAST(column_name AS expected_type) FROM ...
You can probably also try running the query in Athena and store the result in a temporary table and use add_table_asset
to validate against this temporary table since you didn't find any errors with add_table_asset
Because Athena is not a fluent data source, I will be closing but monitoring this issue as I am going to share this gap with the team and circle back with updates
Hi! I'm working with a similar approach but using a Databricks SQL Warehouse as data source and query-based assets and I get the same Exception. Just to raise the problem :)
I'm using GX version 1.1.2
Describe the bug Creating expectation for PostgreSQL and Athena I have the next Exception with ExpectColumnValuesToBeOfType and ExpectColumnValuesToBeInTypeList
{ "success": false, "expectation_config": { "type": "expect_column_values_to_be_oftype", "kwargs": { "column": "id", "type": "INTEGER", "batch_id": "postgresql_datasource" }, "meta": {} }, "result": {}, "meta": {}, "exception_info": { "exception_traceback": "Traceback (most recent call last):\n File \"c:\Users\Administrator\Documents\datalake-ingest\venv\Lib\site-packages\great_expectations\validator\validator.py\", line 648, in graph_validate\n result = expectation.metrics_validate(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"c:\Users\Administrator\Documents\datalake-ingest\venv\Lib\site-packages\great_expectations\expectations\expectation.py\", line 1093, in metrics_validate\n expectation_validation_result: Union[ExpectationValidationResult, dict] = self._validate(\n ^^^^^^^^^^^^^^^\n File \"c:\Users\Administrator\Documents\datalake-ingest\venv\Lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py\", line 595, in _validate\n actual_column_type = [\n ^\n File \"c:\Users\Administrator\Documents\datalake-ingest\venv\Lib\site-packages\great_expectations\expectations\core\expect_column_values_to_be_of_type.py\", line 596, in\n type_dict[\"type\"]\n
~~~~~^^^^^^^^\nKeyError: 'type'\n", "exception_message": "'type'", "raised_exception": true } To Reproduceorigin_data_source_name = "postgresql_datasource" postgresql_source = context.data_sources.add_or_update_postgres(origin_data_source_name, connection_string=connection_string)
asset_name = "postgresql_query_asset" asset_query = f'SELECT * FROM {schema_name}."{table_name}"' query_data_asset = postgresql_source.add_query_asset(query=asset_query, name=asset_name)
full_table_batch_definition = query_data_asset.add_batch_definition_whole_table( name="FULL_TABLE_ORIGIN" ) full_table_batch = full_table_batch_definition.get_batch() full_table_batch.head()
works well until here, but when I validate this expectations
expectation = gx.expectations.ExpectColumnValuesToBeOfType(column="id", type_="INTEGER") validation_results = full_table_batch.validate(expectation)
I got the exception
Expected behavior Return success true or false
Environment (please complete the following information):
Additional context SQLAlchemy 2.0.34 psycopg2-binary 2.9.9 pandas 2.1.4