great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.65k stars 1.5k forks source link

Azure SQL Table Data Asset throwing “NoneType object is not iterable” while validating in azure databricks #9970

Open DineshBaratam-5 opened 1 month ago

DineshBaratam-5 commented 1 month ago

Describe the bug After creating data context using dbfs path in azure databricks, connected to azure sql server data source and a table asset. Created a expectation suite and tried to validate the expectations on top of the table asset. Validator is throwing an issue shown as "NoneType object is not iterable". I have tried the same in local environment instead of databricks and got the same issue.

To Reproduce PFB the code snippet I executed in azure databricks cloud environment

import great_expectations as gx
from great_expectations.checkpoint import Checkpoint
import pandas as pd
import sqlalchemy as sa

contextDirecotry = "/dbfs/great_expectations/"
context = gx.get_context(context_root_dir=contextDirecotry)

connection_url = sa.URL.create(
    "mssql+pyodbc",
    username=targetDbUserName,
    password=targetDbPassword,
    host=targetSQLServerName,
    database=targetDataBaseName,
    query={"driver": "ODBC Driver 18 for SQL Server"}
)
connectionString = connection_url.render_as_string(hide_password=False)

datasource = context.sources.add_or_update_sql(name="TargetSQLDataSource", connection_string=connectionString, create_temp_table=True)
asset_name = "ProductAsset"
dataasset = datasource.add_table_asset(name=asset_name, table_name=targetTableName)
batch_request = dataasset.build_batch_request()
expectation_suite_name = "TargetSQL_expectation_suite"
context.add_or_update_expectation_suite(expectation_suite_name=expectation_suite_name)
validator = context.get_validator(
    batch_request=batch_request,
    expectation_suite_name=expectation_suite_name
)
validator.expect_column_values_to_not_be_null(column="ProductID")
validator.expect_column_values_to_be_between(column="StandardCost", min_value=0, max_value=100000)
validator.save_expectation_suite(discard_failed_expectations=False)`

After executing above code everything is working fine till creation of validator and an error is being thrown at the line "validator.expect_column_values_to_not_be_null(column="ProductID")". PFB the total error trace.

MetricResolutionError: 'NoneType' object is not iterable
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:548, in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    545 try:
    546     resolved_metrics[
    547         metric_computation_configuration.metric_configuration.id
--> 548     ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    549         **metric_computation_configuration.metric_provider_kwargs
    550     )
    551 except Exception as e:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/expectations/metrics/metric_provider.py:60, in metric_value.<locals>.wrapper.<locals>.inner_func(*args, **kwargs)
     58 @wraps(metric_fn)
     59 def inner_func(*args: P.args, **kwargs: P.kwargs):
---> 60     return metric_fn(*args, **kwargs)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/expectations/metrics/table_metrics/table_columns.py:43, in TableColumns._sqlalchemy(cls, execution_engine, metric_domain_kwargs, metric_value_kwargs, metrics, runtime_configuration)
     42 column_metadata = metrics["table.column_types"]
---> 43 return [col["name"] for col in column_metadata]

TypeError: 'NoneType' object is not iterable

The above exception was the direct cause of the following exception:

MetricResolutionError                     Traceback (most recent call last)
File <command-1625200873249524>, line 1
----> 1 validator.expect_column_values_to_not_be_null(column="ProductID")
      2 # validator.expect_column_values_to_be_between(column="StandardCost", min_value=0, max_value=100000)
      3 validator.save_expectation_suite(discard_failed_expectations=False)

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validator.py:590, in Validator.validate_expectation.<locals>.inst_expectation(*args, **kwargs)
    584         validation_result = ExpectationValidationResult(
    585             success=False,
    586             exception_info=exception_info,
    587             expectation_config=configuration,
    588         )
    589     else:
--> 590         raise err
    592 if self._include_rendered_content:
    593     validation_result.render()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validator.py:553, in Validator.validate_expectation.<locals>.inst_expectation(*args, **kwargs)
    549     validation_result = ExpectationValidationResult(
    550         expectation_config=copy.deepcopy(expectation.configuration)
    551     )
    552 else:
--> 553     validation_result = expectation.validate(
    554         validator=self,
    555         evaluation_parameters=self._expectation_suite.evaluation_parameters,
    556         data_context=self._data_context,
    557         runtime_configuration=basic_runtime_configuration,
    558     )
    560 # If validate has set active_validation to true, then we do not save the config to avoid
    561 # saving updating expectation configs to the same suite during validation runs
    562 if self._active_validation is True:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/expectations/expectation.py:1314, in Expectation.validate(self, validator, configuration, evaluation_parameters, interactive_evaluation, data_context, runtime_configuration)
   1305 self._warn_if_result_format_config_in_expectation_configuration(
   1306     configuration=configuration
   1307 )
   1309 configuration.process_evaluation_parameters(
   1310     evaluation_parameters, interactive_evaluation, data_context
   1311 )
   1312 expectation_validation_result_list: list[
   1313     ExpectationValidationResult
-> 1314 ] = validator.graph_validate(
   1315     configurations=[configuration],
   1316     runtime_configuration=runtime_configuration,
   1317 )
   1318 return expectation_validation_result_list[0]

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validator.py:1065, in Validator.graph_validate(self, configurations, runtime_configuration)
   1063         return evrs
   1064     else:
-> 1065         raise err
   1067 configuration: ExpectationConfiguration
   1068 result: ExpectationValidationResult

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validator.py:1044, in Validator.graph_validate(self, configurations, runtime_configuration)
   1037 resolved_metrics: _MetricsDict
   1039 try:
   1040     (
   1041         resolved_metrics,
   1042         evrs,
   1043         processed_configurations,
-> 1044     ) = self._resolve_suite_level_graph_and_process_metric_evaluation_errors(
   1045         graph=graph,
   1046         runtime_configuration=runtime_configuration,
   1047         expectation_validation_graphs=expectation_validation_graphs,
   1048         evrs=evrs,
   1049         processed_configurations=processed_configurations,
   1050         show_progress_bars=self._determine_progress_bars(),
   1051     )
   1052 except Exception as err:
   1053     # If a general Exception occurs during the execution of "ValidationGraph.resolve()", then
   1054     # all expectations in the suite are impacted, because it is impossible to attribute the failure to a metric.
   1055     if catch_exceptions:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validator.py:1200, in Validator._resolve_suite_level_graph_and_process_metric_evaluation_errors(self, graph, runtime_configuration, expectation_validation_graphs, evrs, processed_configurations, show_progress_bars)
   1195 resolved_metrics: _MetricsDict
   1196 aborted_metrics_info: _AbortedMetricsInfoDict
   1197 (
   1198     resolved_metrics,
   1199     aborted_metrics_info,
-> 1200 ) = self._metrics_calculator.resolve_validation_graph(
   1201     graph=graph,
   1202     runtime_configuration=runtime_configuration,
   1203     min_graph_edges_pbar_enable=0,
   1204 )
   1206 # Trace MetricResolutionError occurrences to expectations relying on corresponding malfunctioning metrics.
   1207 rejected_configurations: List[ExpectationConfiguration] = []

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/metrics_calculator.py:274, in MetricsCalculator.resolve_validation_graph(self, graph, runtime_configuration, min_graph_edges_pbar_enable)
    272 resolved_metrics: _MetricsDict
    273 aborted_metrics_info: _AbortedMetricsInfoDict
--> 274 resolved_metrics, aborted_metrics_info = graph.resolve(
    275     runtime_configuration=runtime_configuration,
    276     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    277     show_progress_bars=self._show_progress_bars,
    278 )
    279 return resolved_metrics, aborted_metrics_info

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:202, in ValidationGraph.resolve(self, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    199 resolved_metrics: Dict[_MetricKey, MetricValue] = {}
    201 # updates graph with aborted metrics
--> 202 aborted_metrics_info: _AbortedMetricsInfoDict = self._resolve(
    203     metrics=resolved_metrics,
    204     runtime_configuration=runtime_configuration,
    205     min_graph_edges_pbar_enable=min_graph_edges_pbar_enable,
    206     show_progress_bars=show_progress_bars,
    207 )
    209 return resolved_metrics, aborted_metrics_info

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:302, in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    297                 failed_metric_info[failed_metric.id][
    298                     "exception_info"
    299                 ] = exception_info
    301     else:
--> 302         raise err
    303 except Exception as e:
    304     if catch_exceptions:

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/validator/validation_graph.py:269, in ValidationGraph._resolve(self, metrics, runtime_configuration, min_graph_edges_pbar_enable, show_progress_bars)
    264         computable_metrics.add(metric)
    266 try:
    267     # Access "ExecutionEngine.resolve_metrics()" method, to resolve missing "MetricConfiguration" objects.
    268     metrics.update(
--> 269         self._execution_engine.resolve_metrics(
    270             metrics_to_resolve=computable_metrics,  # type: ignore[arg-type]  # Metric typing needs further refinement.
    271             metrics=metrics,  # type: ignore[arg-type]  # Metric typing needs further refinement.
    272             runtime_configuration=runtime_configuration,
    273         )
    274     )
    275     progress_bar.update(len(computable_metrics))
    276     progress_bar.refresh()

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:283, in ExecutionEngine.resolve_metrics(self, metrics_to_resolve, metrics, runtime_configuration)
    274 metric_fn_bundle_configurations: List[MetricComputationConfiguration]
    275 (
    276     metric_fn_direct_configurations,
    277     metric_fn_bundle_configurations,
   (...)
    281     runtime_configuration=runtime_configuration,
    282 )
--> 283 return self._process_direct_and_bundled_metric_computation_configurations(
    284     metric_fn_direct_configurations=metric_fn_direct_configurations,
    285     metric_fn_bundle_configurations=metric_fn_bundle_configurations,
    286 )

File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/execution_engine/execution_engine.py:552, in ExecutionEngine._process_direct_and_bundled_metric_computation_configurations(self, metric_fn_direct_configurations, metric_fn_bundle_configurations)
    546         resolved_metrics[
    547             metric_computation_configuration.metric_configuration.id
    548         ] = metric_computation_configuration.metric_fn(  # type: ignore[misc] # F not callable
    549             **metric_computation_configuration.metric_provider_kwargs
    550         )
    551     except Exception as e:
--> 552         raise gx_exceptions.MetricResolutionError(
    553             message=str(e),
    554             failed_metrics=(
    555                 metric_computation_configuration.metric_configuration,
    556             ),
    557         ) from e
    559 try:
    560     # an engine-specific way of computing metrics together
    561     resolved_metric_bundle: Dict[
    562         Tuple[str, str, str], MetricValue
    563     ] = self.resolve_metric_bundle(
    564         metric_fn_bundle=metric_fn_bundle_configurations
    565     )

MetricResolutionError: 'NoneType' object is not iterable

Expected behavior Gx workflow should execute, and the validator should save the expectation suite after validating the given expectation.

Environment:

Octacon100 commented 1 month ago

I also get this while making table or query assets with sql server data sources. Validator.columns() appears to return null, but validator.head() returns a valid dataframe with a header, so I'm not sure what is happening there.

DineshBaratam-5 commented 1 week ago

Hi Team, can anyone please provide work around or solution to this approach other than in-memory data asset. Our requirement is to access the data asset anywhere in other notebooks. If we are using in memory data asset it was not possible. Is there any possible way to convert the in-memory data asset to SQL data asset so that we can access it anywhere in the framework. As of now we are using Azure SQL Server as source and implementing the framework in Azure Databricks.