great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.99k stars 1.54k forks source link

Clickhouse broken with like-pattern expectations #8446

Closed matveykortsev closed 8 months ago

matveykortsev commented 1 year ago

Describe the bug I'm attempting to use GE with the newly released ClickHouse support, but it appears that the like-pattern expectations are not functioning properly. Another expectations works well.

To Reproduce Run the following code:

import great_expectations as gx
context = gx.get_context(context_root_dir='/home/jovyan/work')
context.add_or_update_expectation_suite(expectation_suite_name="test_suite")
datasource=context.sources.add_or_update_sql(name='fluent_clickhouse',connection_string='clickhouse+http://user:password@server_address')
asset = datasource.add_table_asset(name=asset_name, table_name=table_name, schema_name=schema_name)
asset.add_splitter_column_value(column_name='pdate')
my_batch_request = asset.build_batch_request(batch_slice = "[-1:]")
validator = context.get_validator(
    batch_request=my_batch_request, expectation_suite_name="test_suite"
)
validator.expect_column_values_to_match_like_pattern(column='customer_user_id', like_pattern='kch%')

Part of great_expectations.yml with datasources setup:

datasources:
  clickhouse_test:
    name: clickhouse_test
    class_name: Datasource
    module_name: great_expectations.datasource
    execution_engine:
      class_name: SqlAlchemyExecutionEngine
      module_name: great_expectations.execution_engine
      connection_string: clickhouse+http://user:password@server_address
      create_temp_table: true
    data_connectors:
      inferred_data_connector_single_batch_asset:
        name: inferred_data_connector_single_batch_asset
        class_name: InferredAssetSqlDataConnector
        module_name: great_expectations.datasource.data_connector
        include_schema_name: true
  fluent_clickhouse:
    type: sql
    assets:
      asset_name:
        type: table
        order_by: []
        batch_metadata: {}
        splitter:
          column_name: pdate
          method_name: split_on_column_value
        table_name: table_name
        schema_name: schema_name
    connection_string: clickhouse+http://user:passowrd@server_address
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:546, in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    543 try:
    544     resolved_metrics[
    545         metric_computation_configuration.metric_configuration.id
--> 546     ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    547         **metric_computation_configuration.metric_provider_kwargs
    548     )
    549 except Exception as e:

File /opt/conda/lib/python3.10/site-packages/great_expectations/expectations/metrics/metric_provider.py:90, in metric_partial.<locals>.wrapper.<locals>.inner_func(*args, **kwargs)
     88 @wraps(metric_fn)
     89 def inner_func(*args, **kwargs):
---> 90     return metric_fn(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/great_expectations/expectations/metrics/map_metric_provider/column_condition_partial.py:186, in column_condition_partial.<locals>.wrapper.<locals>.inner_func(cls, execution_engine, metric_domain_kwargs, metric_value_kwargs, metrics, runtime_configuration)
    184         dialect = sqlalchemy_engine.dialect
--> 186 expected_condition = metric_fn(
    187     cls,
    188     sa.column(column_name),
    189     **metric_value_kwargs,
    190     _dialect=dialect,
    191     _table=selectable,
    192     _execution_engine=execution_engine,
    193     _sqlalchemy_engine=sqlalchemy_engine,
    194     _metrics=metrics,
    195 )
    197 filter_column_isnull = kwargs.get(
    198     "filter_column_isnull", getattr(cls, "filter_column_isnull", True)
    199 )

File /opt/conda/lib/python3.10/site-packages/great_expectations/expectations/metrics/column_map_metrics/column_values_match_like_pattern.py:28, in ColumnValuesMatchLikePattern._sqlalchemy(cls, column, like_pattern, _dialect, **kwargs)
     26 if like_pattern_expression is None:
     27     logger.warning(
---> 28         f"Like patterns are not supported for dialect {str(_dialect.name)}"
     29     )
     30     raise NotImplementedError

AttributeError: module 'clickhouse_sqlalchemy.drivers.base' has no attribute 'name'

The above exception was the direct cause of the following exception:

MetricResolutionError                     Traceback (most recent call last)
Cell In[168], line 1
----> 1 validator.expect_column_values_to_match_like_pattern(column='customer_user_id', like_pattern='kch%')

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validator.py:597, in Validator.validate_expectation.<locals>.inst_expectation(*args, **kwargs)
    591         validation_result = ExpectationValidationResult(
    592             success=False,
    593             exception_info=exception_info,
    594             expectation_config=configuration,
    595         )
    596     else:
--> 597         raise err
    599 if self._include_rendered_content:
    600     validation_result.render()

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validator.py:560, in Validator.validate_expectation.<locals>.inst_expectation(*args, **kwargs)
    556     validation_result = ExpectationValidationResult(
    557         expectation_config=copy.deepcopy(expectation.configuration)
    558     )
    559 else:
--> 560     validation_result = expectation.validate(
    561         validator=self,
    562         evaluation_parameters=self._expectation_suite.evaluation_parameters,
    563         data_context=self._data_context,
    564         runtime_configuration=basic_runtime_configuration,
    565     )
    567 # If validate has set active_validation to true, then we do not save the config to avoid
    568 # saving updating expectation configs to the same suite during validation runs
    569 if self._active_validation is True:

File /opt/conda/lib/python3.10/site-packages/great_expectations/expectations/expectation.py:1276, in Expectation.validate(self, validator, configuration, evaluation_parameters, interactive_evaluation, data_context, runtime_configuration)
   1267 self._warn_if_result_format_config_in_expectation_configuration(
   1268     configuration=configuration
   1269 )
   1271 configuration.process_evaluation_parameters(
   1272     evaluation_parameters, interactive_evaluation, data_context
   1273 )
   1274 expectation_validation_result_list: list[
   1275     ExpectationValidationResult
-> 1276 ] = validator.graph_validate(
   1277     configurations=[configuration],
   1278     runtime_configuration=runtime_configuration,
   1279 )
   1280 return expectation_validation_result_list[0]

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validator.py:1072, in Validator.graph_validate(self, configurations, runtime_configuration)
   1070         return evrs
   1071     else:
-> 1072         raise err
   1074 configuration: ExpectationConfiguration
   1075 result: ExpectationValidationResult

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validator.py:1051, in Validator.graph_validate(self, configurations, runtime_configuration)
   1044 resolved_metrics: Dict[Tuple[str, str, str], MetricValue]
   1046 try:
   1047     (
   1048         resolved_metrics,
   1049         evrs,
   1050         processed_configurations,
-> 1051     ) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
   1052         graph=graph,
   1053         runtime_configuration=runtime_configuration,
   1054         expectation_validation_graphs=expectation_validation_graphs,
   1055         evrs=evrs,
   1056         processed_configurations=processed_configurations,
   1057         show_progress_bars=self._determine_progress_bars(),
   1058     )
   1059 except Exception as err:
   1060     # If a general Exception occurs during the execution of "ValidationGraph.resolve()", then
   1061     # all expectations in the suite are impacted, because it is impossible to attribute the failure to a metric.
   1062     if catch_exceptions:

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validator.py:1210, in Validator._resolve_suite_level_graph_and_process_metric_evaluation_errors(self, graph, runtime_configuration, expectation_validation_graphs, evrs, processed_configurations, show_progress_bars)
   1202 resolved_metrics: Dict[Tuple[str, str, str], MetricValue]
   1203 aborted_metrics_info: Dict[
   1204     Tuple[str, str, str],
   1205     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
   1206 ]
   1207 (
   1208     resolved_metrics,
   1209     aborted_metrics_info,
-> 1210 ) = self._metrics_calculator.resolve_validation_graph(
   1211     graph=graph,
   1212     runtime_configuration=runtime_configuration,
   1213     min_graph_edges_pbar_enable=0,
   1214 )
   1216 # Trace MetricResolutionError occurrences to expectations relying on corresponding malfunctioning metrics.
   1217 rejected_configurations: List[ExpectationConfiguration] = []

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/metrics_calculator.py:283, in MetricsCalculator.resolve_validation_graph(self, graph, runtime_configuration, min_graph_edges_pbar_enable)
    278 resolved_metrics: Dict[Tuple[str, str, str], MetricValue]
    279 aborted_metrics_info: Dict[
    280     Tuple[str, str, str],
    281     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
    282 ]
--> 283 resolved_metrics, aborted_metrics_info = graph.resolve(
    284     runtime_configuration=runtime_configuration,
    285     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    286     show_progress_bars=self._show_progress_bars,
    287 )
    288 return resolved_metrics, aborted_metrics_info

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:206, in ValidationGraph.resolve(self, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    200 resolved_metrics: Dict[Tuple[str, str, str], MetricValue] = {}
    202 # updates graph with aborted metrics
    203 aborted_metrics_info: Dict[
    204     Tuple[str, str, str],
    205     Dict[str, Union[MetricConfiguration, Set[ExceptionInfo], int]],
--> 206 ] = self._resolve(
    207     metrics=resolved_metrics,
    208     runtime_configuration=runtime_configuration,
    209     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    210     show_progress_bars=show_progress_bars,
    211 )
    213 return resolved_metrics, aborted_metrics_info

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:312, in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    308                 failed_metric_info[failed_metric.id]["exception_info"] = {
    309                     exception_info
    310                 }
    311     else:
--> 312         raise err
    313 except Exception as e:
    314     if catch_exceptions:

File /opt/conda/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:282, in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    277         computable_metrics.add(metric)
    279 try:
    280     # Access "ExecutionEngine.resolve_metrics()" method, to resolve missing "MetricConfiguration" objects.
    281     metrics.update(
--> 282         self._execution_engine.resolve_metrics(
    283             metrics_to_resolve=computable_metrics,
    284             metrics=metrics,
    285             runtime_configuration=runtime_configuration,
    286         )
    287     )
    288     progress_bar.update(len(computable_metrics))
    289     progress_bar.refresh()

File /opt/conda/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:281, in ExecutionEngine.resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
    272 metric_fn_bundle_configurations: List[MetricComputationConfiguration]
    273 (
    274     metric_fn_direct_configurations,
    275     metric_fn_bundle_configurations,
   (...)
    279     runtime_configuration=runtime_configuration,
    280 )
--> 281 return self._process_direct_and_bundled_metric_computation_configurations(
    282     metric_fn_direct_configurations=metric_fn_direct_configurations,
    283     metric_fn_bundle_configurations=metric_fn_bundle_configurations,
    284 )

File /opt/conda/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:550, in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    544         resolved_metrics[
    545             metric_computation_configuration.metric_configuration.id
    546         ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    547             **metric_computation_configuration.metric_provider_kwargs
    548         )
    549     except Exception as e:
--> 550         raise gx_exceptions.MetricResolutionError(
    551             message=str(e),
    552             failed_metrics=(
    553                 metric_computation_configuration.metric_configuration,
    554             ),
    555         ) from e
    557 try:
    558     # an engine-specific way of computing metrics together
    559     resolved_metric_bundle: Dict[
    560         Tuple[str, str, str], MetricValue
    561     ] = self.resolve_metric_bundle(
    562         metric_fn_bundle=metric_fn_bundle_configurations
    563     )

MetricResolutionError: module 'clickhouse_sqlalchemy.drivers.base' has no attribute 'name'

But as I can see in clickhouse_sqlalchemy package here https://github.com/xzkostyan/clickhouse-sqlalchemy/blob/060c60131a7a830cb691c23bc9c9931f5e3e19cc/clickhouse_sqlalchemy/drivers/base.py#L78 Attribute 'name' exists. I have the thought that there is mistyping here https://github.com/great-expectations/great_expectations/blob/3aacc5c4309ca6ee5f3291d0840bd16a9ee0ae76/great_expectations/expectations/metrics/util.py#L849 and by that reason, it can be broken? Im also looking for this part of code https://github.com/great-expectations/great_expectations/blob/3aacc5c4309ca6ee5f3291d0840bd16a9ee0ae76/great_expectations/expectations/metrics/map_metric_provider/column_condition_partial.py#L188C3-L188C3 maybe some problem here with ClickHouseDialect?

Expected behavior Expecting that like-pattern expectation will work as usual.

Environment (please complete the following information):

HaebichanGX commented 1 year ago

Hi @matveykortsev thank you for sharing this with us. Let us put this into our backlog for review.

matveykortsev commented 1 year ago

Hi @HaebichanGX , are there any updates about this issue? Its really blocker for us

matveykortsev commented 1 year ago

@HaebichanGX Have you checked this issue?

austiezr commented 8 months ago

Resolved by #9061 / #9068.