great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

Result of checkpoint not valid JSON format #7889

Closed tb102122 closed 1 year ago

tb102122 commented 1 year ago

Describe the bug when running the sample code following the documentation https://docs.greatexpectations.io/docs/guides/validation/checkpoints/how_to_pass_an_in_memory_dataframe_to_a_checkpoint/ The result is not a valid JSON for the element:

To Reproduce run code from sample above

checkpoint_result = checkpoint.run()
print(checkpoint_result["run_results"].keys())

dict_keys([ValidationResultIdentifier::my_expectation_suite/none/20230515T160904.028287Z/my_taxi_validator_checkpoint-taxi_frame])

Expected behavior List with all relevant child elements e.g. name, validationResult, ...

tb102122 commented 1 year ago

I found out that the issue is related to the the code line, but I am not sure how to resolve it.

https://github.com/great-expectations/great_expectations/blob/a71721076376ae66c4b2b7fda253934675ff30be/great_expectations/validation_operators/validation_operators.py#L435

@austiezr are you able to help?

tjholsman commented 1 year ago

Hi @tb102122, the keys of "run_results" are of type ValidationResultIdentifier, not string. Here is one way to access the json from the checkpoint results:

validation_result_identifier = checkpoint_result.list_validation_result_identifiers()[0]
checkpoint_result['run_results'][validation_result_identifier]

For me, this yielded the following JSON object:

{'validation_result': {
   "statistics": {
     "evaluated_expectations": 1,
     "successful_expectations": 1,
     "unsuccessful_expectations": 0,
     "success_percent": 100.0
   },
   "meta": {
     "great_expectations_version": "0.16.6",
     "expectation_suite_name": "default",
     "run_id": {
       "run_time": "2023-05-23T16:05:12.812582-07:00",
       "run_name": null
     },
     "batch_spec": {
       "batch_data": "PandasDataFrame"
     },
     "batch_markers": {
       "ge_load_time": "20230523T230512.773533Z",
       "pandas_data_fingerprint": "c4f929e6d4fab001fedc9e075bf4b612"
     },
     "active_batch_definition": {
       "datasource_name": "taxi_datasource",
       "data_connector_name": "fluent",
       "data_asset_name": "taxi_frame",
       "batch_identifiers": {}
     },
     "validation_time": "20230523T230512.821212Z",
     "checkpoint_name": "my_taxi_validator_checkpoint",
     "validation_id": null,
     "checkpoint_id": null
   },
   "evaluation_parameters": {},
   "success": true,
   "results": [
     {
       "result": {
         "element_count": 10000,
         "unexpected_count": 0,
         "unexpected_percent": 0.0,
         "partial_unexpected_list": [],
         "partial_unexpected_index_list": [],
         "partial_unexpected_counts": []
       },
       "meta": {},
       "success": true,
       "expectation_config": {
         "expectation_type": "expect_column_values_to_not_be_null",
         "meta": {},
         "kwargs": {
           "column": "pickup_datetime",
           "batch_id": "taxi_datasource-taxi_frame"
         }
       },
       "exception_info": {
         "raised_exception": false,
         "exception_traceback": null,
         "exception_message": null
       }
     }
   ]
 },
 'actions_results': {'store_validation_result': {'class': 'StoreValidationResultAction'},
  'store_evaluation_params': {'class': 'StoreEvaluationParametersAction'},
  'update_data_docs': {'local_site': 'file:///Users/holsmantj/gx/great_expectations/uncommitted/data_docs/local_site/validations/default/__none__/20230523T230512.812582Z/taxi_datasource-taxi_frame.html',
   'class': 'UpdateDataDocsAction'}}}

Could you give this a try?

tb102122 commented 1 year ago

Hey @tjholsman that works but for some reason my results list is empty. I pass in the Validations based on the following code

gx_context = gx.get_context()
gx_context.add_or_update_expectation_suite("my_expectation_suite")

df = pandas.read_csv(r".\gx_tutorials-main\data\yellow_tripdata_sample_2019-01.csv")

dataframe_asset = gx_context.sources.add_pandas("my_validator_checkpoint").add_dataframe_asset(
    name="my_frame", dataframe=df, batch_metadata={"year": "2019", "month": "01"}
)

batch_request = dataframe_asset.build_batch_request()

checkpoint = SimpleCheckpoint(
    name="my_validator_checkpoint",
    data_context=gx_context,
    batch_request=batch_request,
    expectation_suite_name="my_expectation_suite",
    validations=[
        {
            "expectation_type": "expect_column_values_to_not_be_null",
            "kwargs": {
                "column": "pickup_datetime",
            },
        }
    ],
)

checkpoint_result = checkpoint.run()

validation_result_identifier = checkpoint_result.list_validation_result_identifiers()
result = checkpoint_result["run_results"][validation_result_identifier[0]]
print(result)
tb102122 commented 1 year ago

@tjholsman Any idea what is happening?

tb102122 commented 1 year ago

I found out what is the reason. I am missing to ad the Expectations and the above code is no longer supported. The following code created the expected result.


from great_expectations.core.expectation_configuration import ExpectationConfiguration
expectation_configuration = ExpectationConfiguration(
    expectation_type="expect_column_values_to_not_be_null",
    kwargs={
        "column": "pickup_datetime",
    },
)

context.add_or_update_expectation_suite("my_expectation_suite", expectations=[expectation_configuration])

batch_request = dataframe_asset.build_batch_request()

checkpoint = SimpleCheckpoint(
    name="my_taxi_dataframe_checkpoint",
    data_context=context,
    batch_request=batch_request,
    expectation_suite_name="my_expectation_suite",
)