great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.7k stars 1.5k forks source link

ClickHouse broken with expect_column_values_to_be_unique expectation #8537

Open ricardo-barreira opened 11 months ago

ricardo-barreira commented 11 months ago

Description When using the expect_column_values_to_be_unique expectation with ClickHouse as the backend, I encountered an error related to the Decimal data type. The error message suggests there's an issue with how Great Expectations is generating the SQL query for this specific expectation. The issue seems to be related to the _sqlalchemy_window method in the column_values_unique.py Metric.

To Reproduce Run the following code:

import great_expectations as gx

username = "username"
password = "password"
host = "host"
native_port = 11111
schema = "schema"

connection_string = f"clickhouse+native://{username}:{password}@{host}:{native_port}/{schema}?secure=True"

context = gx.get_context()
datasource = context.sources.add_sql(
    name="test_datasource", connection_string=connection_string
)

table_asset = datasource.add_table_asset(name="test_table_asset", table_name="target_table")
batch_request = table_asset.build_batch_request()

validator = context.get_validator(
    batch_request=batch_request, expectation_suite_name="test_expectation_suite"
)

validator.expect_column_values_to_be_unique(column="target_col")

Generated datasource in the great_expectations.yml:

(...)
fluent_datasources:
  test_datasource:
    type: sql
    connection_string: clickhouse+native://username:password@host:11111/schema?secure=True

Error stack trace:

---------------------------------------------------------------------------
ServerException                           Traceback (most recent call last)
File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py:152](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py:152), in Cursor.execute(self, operation, parameters, context)
    150     execute, execute_kwargs = self._prepare(context)
--> 152     response = execute(
    153         operation, params=parameters, with_column_types=True,
    154         **execute_kwargs
    155     )
    157 except DriverError as orig:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:373](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:373), in Client.execute(self, query, params, with_column_types, external_tables, query_id, settings, types_check, columnar)
    372 else:
--> 373     rv = self.process_ordinary_query(
    374         query, params=params, with_column_types=with_column_types,
    375         external_tables=external_tables,
    376         query_id=query_id, types_check=types_check,
    377         columnar=columnar
    378     )
    379 self.last_query.store_elapsed(time() - start_time)

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:571](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:571), in Client.process_ordinary_query(self, query, params, with_column_types, external_tables, query_id, types_check, columnar)
    569 self.connection.send_external_tables(external_tables,
    570                                      types_check=types_check)
--> 571 return self.receive_result(with_column_types=with_column_types,
    572                            columnar=columnar)

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:204](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:204), in Client.receive_result(self, with_column_types, progress, columnar)
    201 result = self.query_result_cls(
    202     gen, with_column_types=with_column_types, columnar=columnar
    203 )
--> 204 return result.get_result()

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/result.py:50](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/result.py:50), in QueryResult.get_result(self)
     46 """
     47 :return: stored query result.
     48 """
---> 50 for packet in self.packet_generator:
     51     self.store(packet)

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:220](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:220), in Client.packet_generator(self)
    219 try:
--> 220     packet = self.receive_packet()
    221     if not packet:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:237](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_driver/client.py:237), in Client.receive_packet(self)
    236 if packet.type == ServerPacketTypes.EXCEPTION:
--> 237     raise packet.exception
    239 elif packet.type == ServerPacketTypes.PROGRESS:

ServerException: Code: 43.
DB::Exception: Decimal data type family must have two numbers as its arguments: While processing if((target_col IS NOT NULL) AND (target_col IN ((SELECT target_col FROM (SELECT * FROM target_table WHERE true) AS anon_2 GROUP BY target_col HAVING count(target_col) > 1) AS _subquery1912405)), CAST(1, 'Decimal(None, None)'), CAST(0, 'Decimal(None, None)')) AS condition. Stack trace:

0. Poco::Exception::Exception(String const&, int) in /usr/bin/clickhouse
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) in /usr/bin/clickhouse
2. DB::Exception::Exception(int, char const (&) [64]) in /usr/bin/clickhouse
3. DB::create(std::shared_ptr const&) in /usr/bin/clickhouse
4. std::shared_ptr DB::DataTypeFactory::getImpl(String const&, std::shared_ptr const&) const in /usr/bin/clickhouse
5. std::shared_ptr DB::DataTypeFactory::getImpl(String const&) const in /usr/bin/clickhouse
6. DB::CastOverloadResolverImpl<(DB::CastType)0, false, DB::CastOverloadName, DB::CastName>::getReturnTypeImpl(std::vector> const&) const in /usr/bin/clickhouse
7. DB::IFunctionOverloadResolver::getReturnTypeWithoutLowCardinality(std::vector> const&) const in /usr/bin/clickhouse
8. DB::IFunctionOverloadResolver::getReturnType(std::vector> const&) const in /usr/bin/clickhouse
9. DB::IFunctionOverloadResolver::build(std::vector> const&) const in /usr/bin/clickhouse
10. DB::ActionsDAG::addFunction(std::shared_ptr const&, std::vector>, String) in /usr/bin/clickhouse
11. DB::ScopeStack::addFunction(std::shared_ptr const&, std::vector> const&, String) in /usr/bin/clickhouse
12. DB::ActionsMatcher::Data::addFunction(std::shared_ptr const&, std::vector> const&, String) in /usr/bin/clickhouse
13. DB::ActionsMatcher::visit(DB::ASTFunction const&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
14. DB::ActionsMatcher::visit(DB::ASTFunction const&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
15. DB::ActionsMatcher::visit(DB::ASTExpressionList&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
16. DB::InDepthNodeVisitor const>::doVisit(std::shared_ptr const&) in /usr/bin/clickhouse
17. DB::ExpressionAnalyzer::getRootActions(std::shared_ptr const&, bool, std::shared_ptr&, bool) in /usr/bin/clickhouse
18. DB::SelectQueryExpressionAnalyzer::appendSelect(DB::ExpressionActionsChain&, bool) in /usr/bin/clickhouse
19. DB::ExpressionAnalysisResult::ExpressionAnalysisResult(DB::SelectQueryExpressionAnalyzer&, std::shared_ptr const&, bool, bool, bool, std::shared_ptr const&, std::shared_ptr const&, DB::Block const&) in /usr/bin/clickhouse
20. DB::InterpreterSelectQuery::getSampleBlockImpl() in /usr/bin/clickhouse
21. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr)::$_4::operator()(bool) const in /usr/bin/clickhouse
22. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr) in /usr/bin/clickhouse
23. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
24. DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(std::shared_ptr const&, std::vector> const&) in /usr/bin/clickhouse
25. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr const&, std::shared_ptr, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
26. DB::InterpreterSelectWithUnionQuery::getSampleBlock(std::shared_ptr const&, std::shared_ptr, bool, bool) in /usr/bin/clickhouse
27. DB::getDatabaseAndTablesWithColumns(std::vector> const&, std::shared_ptr, bool, bool, bool) in /usr/bin/clickhouse
28. DB::JoinedTables::resolveTables() in /usr/bin/clickhouse
29. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr) in /usr/bin/clickhouse
30. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
31. DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(std::shared_ptr const&, std::vector> const&) in /usr/bin/clickhouse

During handling of the above exception, another exception occurred:

DatabaseException                         Traceback (most recent call last)
File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:546](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:546), in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    543 try:
    544     resolved_metrics[
    545         metric_computation_configuration.metric_configuration.id
--> 546     ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    547         **metric_computation_configuration.metric_provider_kwargs
    548     )
    549 except Exception as e:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/expectations/metrics/map_metric_provider/map_condition_auxilliary_methods.py:353](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/expectations/metrics/map_metric_provider/map_condition_auxilliary_methods.py:353), in _sqlalchemy_map_condition_unexpected_count_value(cls, execution_engine, metric_domain_kwargs, metric_value_kwargs, metrics, **kwargs)
    346 unexpected_count_query: sqlalchemy.Select = (
    347     sa.select(
    348         sa.func.sum(sa.column("condition")).label("unexpected_count"),
   (...)
    351     .alias("UnexpectedCountSubquery")
    352 )
--> 353 unexpected_count: Union[float, int] = execution_engine.execute_query(
    354     sa.select(
    355         unexpected_count_query.c[
    356             f"{SummarizationMetricNameSuffixes.UNEXPECTED_COUNT.value}"
    357         ],
    358     )
    359 ).scalar()
    360 # Unexpected count can be None if the table is empty, in which case the count
    361 # should default to zero.

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/sqlalchemy_execution_engine.py:1416](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/sqlalchemy_execution_engine.py:1416), in SqlAlchemyExecutionEngine.execute_query(self, query)
   1415 with self.get_connection() as connection:
-> 1416     result = connection.execute(query)
   1418 return result

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1385](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1385), in Connection.execute(self, statement, *multiparams, **params)
   1384 else:
-> 1385     return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/sql/elements.py:334](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/sql/elements.py:334), in ClauseElement._execute_on_connection(self, connection, multiparams, params, execution_options, _force)
    333 if _force or self.supports_execution:
--> 334     return connection._execute_clauseelement(
    335         self, multiparams, params, execution_options
    336     )
    337 else:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1577](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1577), in Connection._execute_clauseelement(self, elem, multiparams, params, execution_options)
   1569 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
   1570     dialect=dialect,
   1571     compiled_cache=compiled_cache,
   (...)
   1575     linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
   1576 )
-> 1577 ret = self._execute_context(
   1578     dialect,
   1579     dialect.execution_ctx_cls._init_compiled,
   1580     compiled_sql,
   1581     distilled_params,
   1582     execution_options,
   1583     compiled_sql,
   1584     distilled_params,
   1585     elem,
   1586     extracted_params,
   1587     cache_hit=cache_hit,
   1588 )
   1589 if has_events:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1953](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1953), in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1952 except BaseException as e:
-> 1953     self._handle_dbapi_exception(
   1954         e, statement, parameters, cursor, context
   1955     )
   1957 return result

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:2138](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:2138), in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   2137     else:
-> 2138         util.raise_(exc_info[1], with_traceback=exc_info[2])
   2140 finally:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/util/compat.py:211](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/util/compat.py:211), in raise_(***failed resolving arguments***)
    210 try:
--> 211     raise exception
    212 finally:
    213     # credit to
    214     # https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/
    215     # as the __traceback__ object creates a cycle

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1910](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/sqlalchemy/engine/base.py:1910), in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
   1909     if not evt_handled:
-> 1910         self.dialect.do_execute(
   1911             cursor, statement, parameters, context
   1912         )
   1914 if self._has_events or self.engine._has_events:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:416](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/base.py:416), in ClickHouseDialect.do_execute(self, cursor, statement, parameters, context)
    415 def do_execute(self, cursor, statement, parameters, context=None):
--> 416     cursor.execute(statement, parameters, context=context)

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py:158](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/clickhouse_sqlalchemy/drivers/native/connector.py:158), in Cursor.execute(self, operation, parameters, context)
    157 except DriverError as orig:
--> 158     raise DatabaseException(orig)
    160 self._process_response(response)

DatabaseException: Orig exception: Code: 43.
DB::Exception: Decimal data type family must have two numbers as its arguments: While processing if((target_col IS NOT NULL) AND (target_col IN ((SELECT target_col FROM (SELECT * FROM target_table WHERE true) AS anon_2 GROUP BY target_col HAVING count(target_col) > 1) AS _subquery1912405)), CAST(1, 'Decimal(None, None)'), CAST(0, 'Decimal(None, None)')) AS condition. Stack trace:

0. Poco::Exception::Exception(String const&, int) in /usr/bin/clickhouse
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) in /usr/bin/clickhouse
2. DB::Exception::Exception(int, char const (&) [64]) in /usr/bin/clickhouse
3. DB::create(std::shared_ptr const&) in /usr/bin/clickhouse
4. std::shared_ptr DB::DataTypeFactory::getImpl(String const&, std::shared_ptr const&) const in /usr/bin/clickhouse
5. std::shared_ptr DB::DataTypeFactory::getImpl(String const&) const in /usr/bin/clickhouse
6. DB::CastOverloadResolverImpl<(DB::CastType)0, false, DB::CastOverloadName, DB::CastName>::getReturnTypeImpl(std::vector> const&) const in /usr/bin/clickhouse
7. DB::IFunctionOverloadResolver::getReturnTypeWithoutLowCardinality(std::vector> const&) const in /usr/bin/clickhouse
8. DB::IFunctionOverloadResolver::getReturnType(std::vector> const&) const in /usr/bin/clickhouse
9. DB::IFunctionOverloadResolver::build(std::vector> const&) const in /usr/bin/clickhouse
10. DB::ActionsDAG::addFunction(std::shared_ptr const&, std::vector>, String) in /usr/bin/clickhouse
11. DB::ScopeStack::addFunction(std::shared_ptr const&, std::vector> const&, String) in /usr/bin/clickhouse
12. DB::ActionsMatcher::Data::addFunction(std::shared_ptr const&, std::vector> const&, String) in /usr/bin/clickhouse
13. DB::ActionsMatcher::visit(DB::ASTFunction const&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
14. DB::ActionsMatcher::visit(DB::ASTFunction const&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
15. DB::ActionsMatcher::visit(DB::ASTExpressionList&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
16. DB::InDepthNodeVisitor const>::doVisit(std::shared_ptr const&) in /usr/bin/clickhouse
17. DB::ExpressionAnalyzer::getRootActions(std::shared_ptr const&, bool, std::shared_ptr&, bool) in /usr/bin/clickhouse
18. DB::SelectQueryExpressionAnalyzer::appendSelect(DB::ExpressionActionsChain&, bool) in /usr/bin/clickhouse
19. DB::ExpressionAnalysisResult::ExpressionAnalysisResult(DB::SelectQueryExpressionAnalyzer&, std::shared_ptr const&, bool, bool, bool, std::shared_ptr const&, std::shared_ptr const&, DB::Block const&) in /usr/bin/clickhouse
20. DB::InterpreterSelectQuery::getSampleBlockImpl() in /usr/bin/clickhouse
21. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr)::$_4::operator()(bool) const in /usr/bin/clickhouse
22. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr) in /usr/bin/clickhouse
23. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
24. DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(std::shared_ptr const&, std::vector> const&) in /usr/bin/clickhouse
25. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr const&, std::shared_ptr, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
26. DB::InterpreterSelectWithUnionQuery::getSampleBlock(std::shared_ptr const&, std::shared_ptr, bool, bool) in /usr/bin/clickhouse
27. DB::getDatabaseAndTablesWithColumns(std::vector> const&, std::shared_ptr, bool, bool, bool) in /usr/bin/clickhouse
28. DB::JoinedTables::resolveTables() in /usr/bin/clickhouse
29. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr) in /usr/bin/clickhouse
30. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
31. DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(std::shared_ptr const&, std::vector> const&) in /usr/bin/clickhouse

The above exception was the direct cause of the following exception:

MetricResolutionError                     Traceback (most recent call last)
Cell In[10], line 1
----> 1 validator.expect_column_values_to_be_unique(column="target_col")

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:600](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:600), in Validator.validate_expectation..inst_expectation(*args, **kwargs)
    594         validation_result = ExpectationValidationResult(
    595             success=False,
    596             exception_info=exception_info,
    597             expectation_config=configuration,
    598         )
    599     else:
--> 600         raise err
    602 if self._include_rendered_content:
    603     validation_result.render()

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:563](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:563), in Validator.validate_expectation..inst_expectation(*args, **kwargs)
    559     validation_result = ExpectationValidationResult(
    560         expectation_config=copy.deepcopy(expectation.configuration)
    561     )
    562 else:
--> 563     validation_result = expectation.validate(
    564         validator=self,
    565         evaluation_parameters=self._expectation_suite.evaluation_parameters,
    566         data_context=self._data_context,
    567         runtime_configuration=basic_runtime_configuration,
    568     )
    570 # If validate has set active_validation to true, then we do not save the config to avoid
    571 # saving updating expectation configs to the same suite during validation runs
    572 if self._active_validation is True:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/expectations/expectation.py:1276](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/expectations/expectation.py:1276), in Expectation.validate(self, validator, configuration, evaluation_parameters, interactive_evaluation, data_context, runtime_configuration)
   1267 self._warn_if_result_format_config_in_expectation_configuration(
   1268     configuration=configuration
   1269 )
   1271 configuration.process_evaluation_parameters(
   1272     evaluation_parameters, interactive_evaluation, data_context
   1273 )
   1274 expectation_validation_result_list: list[
   1275     ExpectationValidationResult
-> 1276 ] = validator.graph_validate(
   1277     configurations=[configuration],
   1278     runtime_configuration=runtime_configuration,
   1279 )
   1280 return expectation_validation_result_list[0]

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:1075](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:1075), in Validator.graph_validate(self, configurations, runtime_configuration)
   1073         return evrs
   1074     else:
-> 1075         raise err
   1077 configuration: ExpectationConfiguration
   1078 result: ExpectationValidationResult

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:1054](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:1054), in Validator.graph_validate(self, configurations, runtime_configuration)
   1047 resolved_metrics: _MetricsDict
   1049 try:
   1050     (
   1051         resolved_metrics,
   1052         evrs,
   1053         processed_configurations,
-> 1054     ) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
   1055         graph=graph,
   1056         runtime_configuration=runtime_configuration,
   1057         expectation_validation_graphs=expectation_validation_graphs,
   1058         evrs=evrs,
   1059         processed_configurations=processed_configurations,
   1060         show_progress_bars=self._determine_progress_bars(),
   1061     )
   1062 except Exception as err:
   1063     # If a general Exception occurs during the execution of "ValidationGraph.resolve()", then
   1064     # all expectations in the suite are impacted, because it is impossible to attribute the failure to a metric.
   1065     if catch_exceptions:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:1213](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validator.py:1213), in Validator._resolve_suite_level_graph_and_process_metric_evaluation_errors(self, graph, runtime_configuration, expectation_validation_graphs, evrs, processed_configurations, show_progress_bars)
   1205 resolved_metrics: _MetricsDict
   1206 aborted_metrics_info: Dict[
   1207     _MetricKey,
   1208     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
   1209 ]
   1210 (
   1211     resolved_metrics,
   1212     aborted_metrics_info,
-> 1213 ) = self._metrics_calculator.resolve_validation_graph(
   1214     graph=graph,
   1215     runtime_configuration=runtime_configuration,
   1216     min_graph_edges_pbar_enable=0,
   1217 )
   1219 # Trace MetricResolutionError occurrences to expectations relying on corresponding malfunctioning metrics.
   1220 rejected_configurations: List[ExpectationConfiguration] = []

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/metrics_calculator.py:287](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/metrics_calculator.py:287), in MetricsCalculator.resolve_validation_graph(self, graph, runtime_configuration, min_graph_edges_pbar_enable)
    282 resolved_metrics: _MetricsDict
    283 aborted_metrics_info: Dict[
    284     _MetricKey,
    285     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
    286 ]
--> 287 resolved_metrics, aborted_metrics_info = graph.resolve(
    288     runtime_configuration=runtime_configuration,
    289     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    290     show_progress_bars=self._show_progress_bars,
    291 )
    292 return resolved_metrics, aborted_metrics_info

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:207](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:207), in ValidationGraph.resolve(self, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    201 resolved_metrics: Dict[_MetricKey, MetricValue] = {}
    203 # updates graph with aborted metrics
    204 aborted_metrics_info: Dict[
    205     _MetricKey,
    206     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
--> 207 ] = self._resolve(
    208     metrics=resolved_metrics,
    209     runtime_configuration=runtime_configuration,
    210     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    211     show_progress_bars=show_progress_bars,
    212 )
    214 return resolved_metrics, aborted_metrics_info

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:313](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:313), in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    309                 failed_metric_info[failed_metric.id]["exception_info"] = {
    310                     exception_info
    311                 }
    312     else:
--> 313         raise err
    314 except Exception as e:
    315     if catch_exceptions:

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:283](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:283), in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    278         computable_metrics.add(metric)
    280 try:
    281     # Access "ExecutionEngine.resolve_metrics()" method, to resolve missing "MetricConfiguration" objects.
    282     metrics.update(
--> 283         self._execution_engine.resolve_metrics(
    284             metrics_to_resolve=computable_metrics,  # type: ignore[arg-type]  # Metric typing needs further refinement.
    285             metrics=metrics,  # type: ignore[arg-type]  # Metric typing needs further refinement.
    286             runtime_configuration=runtime_configuration,
    287         )
    288     )
    289     progress_bar.update(len(computable_metrics))
    290     progress_bar.refresh()

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:281](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:281), in ExecutionEngine.resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
    272 metric_fn_bundle_configurations: List[MetricComputationConfiguration]
    273 (
    274     metric_fn_direct_configurations,
    275     metric_fn_bundle_configurations,
   (...)
    279     runtime_configuration=runtime_configuration,
    280 )
--> 281 return self._process_direct_and_bundled_metric_computation_configurations(
    282     metric_fn_direct_configurations=metric_fn_direct_configurations,
    283     metric_fn_bundle_configurations=metric_fn_bundle_configurations,
    284 )

File [/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:550](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/Caskroom/miniconda/base/envs/gx_env/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:550), in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    544         resolved_metrics[
    545             metric_computation_configuration.metric_configuration.id
    546         ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    547             **metric_computation_configuration.metric_provider_kwargs
    548         )
    549     except Exception as e:
--> 550         raise gx_exceptions.MetricResolutionError(
    551             message=str(e),
    552             failed_metrics=(
    553                 metric_computation_configuration.metric_configuration,
    554             ),
    555         ) from e
    557 try:
    558     # an engine-specific way of computing metrics together
    559     resolved_metric_bundle: Dict[
    560         Tuple[str, str, str], MetricValue
    561     ] = self.resolve_metric_bundle(
    562         metric_fn_bundle=metric_fn_bundle_configurations
    563     )

MetricResolutionError: Orig exception: Code: 43.
DB::Exception: Decimal data type family must have two numbers as its arguments: While processing if((target_col IS NOT NULL) AND (target_col IN ((SELECT target_col FROM (SELECT * FROM target_table WHERE true) AS anon_2 GROUP BY target_col HAVING count(target_col) > 1) AS _subquery1912405)), CAST(1, 'Decimal(None, None)'), CAST(0, 'Decimal(None, None)')) AS condition. Stack trace:

0. Poco::Exception::Exception(String const&, int) in /usr/bin/clickhouse
1. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) in /usr/bin/clickhouse
2. DB::Exception::Exception(int, char const (&) [64]) in /usr/bin/clickhouse
3. DB::create(std::shared_ptr const&) in /usr/bin/clickhouse
4. std::shared_ptr DB::DataTypeFactory::getImpl(String const&, std::shared_ptr const&) const in /usr/bin/clickhouse
5. std::shared_ptr DB::DataTypeFactory::getImpl(String const&) const in /usr/bin/clickhouse
6. DB::CastOverloadResolverImpl<(DB::CastType)0, false, DB::CastOverloadName, DB::CastName>::getReturnTypeImpl(std::vector> const&) const in /usr/bin/clickhouse
7. DB::IFunctionOverloadResolver::getReturnTypeWithoutLowCardinality(std::vector> const&) const in /usr/bin/clickhouse
8. DB::IFunctionOverloadResolver::getReturnType(std::vector> const&) const in /usr/bin/clickhouse
9. DB::IFunctionOverloadResolver::build(std::vector> const&) const in /usr/bin/clickhouse
10. DB::ActionsDAG::addFunction(std::shared_ptr const&, std::vector>, String) in /usr/bin/clickhouse
11. DB::ScopeStack::addFunction(std::shared_ptr const&, std::vector> const&, String) in /usr/bin/clickhouse
12. DB::ActionsMatcher::Data::addFunction(std::shared_ptr const&, std::vector> const&, String) in /usr/bin/clickhouse
13. DB::ActionsMatcher::visit(DB::ASTFunction const&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
14. DB::ActionsMatcher::visit(DB::ASTFunction const&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
15. DB::ActionsMatcher::visit(DB::ASTExpressionList&, std::shared_ptr const&, DB::ActionsMatcher::Data&) in /usr/bin/clickhouse
16. DB::InDepthNodeVisitor const>::doVisit(std::shared_ptr const&) in /usr/bin/clickhouse
17. DB::ExpressionAnalyzer::getRootActions(std::shared_ptr const&, bool, std::shared_ptr&, bool) in /usr/bin/clickhouse
18. DB::SelectQueryExpressionAnalyzer::appendSelect(DB::ExpressionActionsChain&, bool) in /usr/bin/clickhouse
19. DB::ExpressionAnalysisResult::ExpressionAnalysisResult(DB::SelectQueryExpressionAnalyzer&, std::shared_ptr const&, bool, bool, bool, std::shared_ptr const&, std::shared_ptr const&, DB::Block const&) in /usr/bin/clickhouse
20. DB::InterpreterSelectQuery::getSampleBlockImpl() in /usr/bin/clickhouse
21. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr)::$_4::operator()(bool) const in /usr/bin/clickhouse
22. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr) in /usr/bin/clickhouse
23. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
24. DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(std::shared_ptr const&, std::vector> const&) in /usr/bin/clickhouse
25. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr const&, std::shared_ptr, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
26. DB::InterpreterSelectWithUnionQuery::getSampleBlock(std::shared_ptr const&, std::shared_ptr, bool, bool) in /usr/bin/clickhouse
27. DB::getDatabaseAndTablesWithColumns(std::vector> const&, std::shared_ptr, bool, bool, bool) in /usr/bin/clickhouse
28. DB::JoinedTables::resolveTables() in /usr/bin/clickhouse
29. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, std::optional, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&, std::shared_ptr const&, std::shared_ptr) in /usr/bin/clickhouse
30. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr const&, std::shared_ptr const&, DB::SelectQueryOptions const&, std::vector> const&) in /usr/bin/clickhouse
31. DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter(std::shared_ptr const&, std::vector> const&) in /usr/bin/clickhouse

Expected behavior The expectation should run without any errors.

Environment:

ricardo-barreira commented 10 months ago

Hi all, just wanted to follow up on this issue. Any updates would be greatly appreciated. Thanks!

matveykortsev commented 9 months ago

Hi, I am facing the same error with another expectation:

expect_column_pair_values_to_be_equal expect_compound_columns_to_be_unique expect_select_column_values_to_be_unique_within_record

It will be really helpful to fix it

Gerrit-K commented 4 months ago

Same here, and it's a blocker, holding us back from integrating Great Expectations.

I've looked a bit into the code and noticed a few things that might help investigating.

First, as described above, the generated query contains the clause ... THEN CAST(%(param_1)s AS Decimal(None, None)) ELSE CAST(%(param_2)s AS Decimal(None, None)) .... ClickHouse obviously doesn't understand the Nones here, which are passed for precision and scale. As you can see in the clickhouse-sqlalchemy driver code, there are no None checks in the type compiler:

    def visit_numeric(self, type_, **kw):
        return 'Decimal(%s, %s)' % (type_.precision, type_.scale)

So one thing to discuss is whether or not these checks should be done in the driver or if sqlalchemy or Great Expectations should properly populate these fields.

Next thing I noticed: these casts are really just being done for a weird "true"/"false" conversion for Redshift: https://github.com/great-expectations/great_expectations/blob/f1c565980a6bc46e465134bba020d64fb8e9c81f/great_expectations/expectations/metrics/map_metric_provider/map_condition_auxilliary_methods.py#L305-L312 So the bug might not even manifest itself if it wasn't for this workaround.

We would highly appreciate if someone with deeper knowledge of GE and sqlalchemy could take a look at this 🙏

Gerrit-K commented 4 months ago

I managed to work around these issues by monkey-patching the ClickHouseTypeCompiler for the Numeric type:

def _patched_visit_numeric(self, type_, **kw):
    if type_.precision is None:
        return "Decimal"
    if type_.scale is None:
        return "Decimal(%s)" % type_.precision
    return "Decimal(%s, %s)" % (type_.precision, type_.scale)

ClickHouseTypeCompiler.visit_numeric = _patched_visit_numeric

This leads me to believe that it's indeed a shortcoming of the clickhouse-sqlalchemy package. I'll raise the issue there and contribute this fix.

While we're at it, could we perhaps also change the clickhouse extra to be sqlalchemy2 compatible here?https://github.com/great-expectations/great_expectations/blob/f5d4500fe434d3b25ee24a65321778b241039271/setup.py#L18-L35 It claims to support sqlalchemy 2 since release 0.3.0 (see Changelog). Otherwise, this will hold back any fixes we contribute to clickhouse-sqlalchemy.

matveykortsev commented 4 months ago

@Gerrit-K I wanted to ask, you still looking for contrib in clickhouse-sqlalchemy? If you are busy or not interested I can do it by myself because its blocking me too. About sqlalchemy 2 support for clickhouse, I think we need implement lots of tests before switching from 1.x, otherwise we can face same tons of bugs, clickhouse support still weak :(

Gerrit-K commented 4 months ago

Hey @matveykortsev, sorry for the delay. I initially planned to do this but then got caught up with other things. It'll take a week or two until I will be able to take a shot at it, so if you want to take over, you're more than welcome!

Regarding sqlalchemy 2, that's a fair point. I already assumed it might not be as easy as just moving the version definition :/