great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

TypeError: Object of type date is not JSON serializable #8845

Open umawani opened 9 months ago

umawani commented 9 months ago

Bug Description Seems like when there is a non-string/number data type to return in the result object, the serialization of the object to JSON fails.

To Reproduce Here is my great_expcetations.yml:

datasources:
  test_db:
    module_name: great_expectations.datasource
    data_connectors:
      default_inferred_data_connector_name:
        module_name: great_expectations.datasource.data_connector
        class_name: InferredAssetSqlDataConnector
        name: default_inferred_data_connector_name
    execution_engine:
      module_name: great_expectations.execution_engine
      connection_string: ${connection_string_env}
      class_name: SqlAlchemyExecutionEngine
      create_temp_table: false
    class_name: Datasource 
stores:
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: ./
  evaluation_parameter_store:
    class_name: EvaluationParameterStore
  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: ./

My expectation suite:

{
  "expectation_suite_name": "test_suite",
  "expectations": [
    {
      "expectation_type": "expect_column_distinct_values_to_be_in_set",
      "kwargs": {
        "column": "STATE",
        "value_set": ["AK", "AL", "AR"],
        "expectationId": 1
      }
    },
    {
      "expectation_type": "expect_column_values_to_match_regex",
      "kwargs": {
        "column": "EMAIL",
        "regex": "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\\\.[A-Za-z]{3,}",
        "expectationId": 1
      }
    },
    {
      "expectation_type": "expect_column_values_to_not_be_null",
      "kwargs": {
        "column": "FIRST_NAME",
        "expectationId": 1
      }
    },
    {
      "expectation_type": "expect_column_max_to_be_between",
      "kwargs": {
        "column": "POSTAL_CODE",
        "min_value": 99000,
        "max_value": 100000,
        "strict_min": true,
        "strict_max": true,
        "expectationId": 1
      }
    }
  ],
  "meta": {
  }
}

And finally my checkpoint file:

name: checkpoint-test
module_name: great_expectations.checkpoint
class_name: Checkpoint
action_list:
  - name: datahub_action
    action:
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: ${dh_url_env}
      platform_instance_map:
        test_db: postgres-1
      token: ${dh_token_env}
runtime_configuration: {
  "result_format": {
    "result_format": "SUMMARY",
    "return_unexpected_index_query": True,
    "partial_unexpected_count": 3,
    "include_unexpected_rows": True
  }
}
validations:
  - batch_request:
      datasource_name: test_db
      data_asset_name: ${table_name_env}
      data_connector_name: default_inferred_data_connector_name
    expectation_suite_name: test_suite

The Python code that was used to execute this checkpoint is pretty straighforward:

import great_expectations as gx
from great_expectations.checkpoint.types.checkpoint_result import CheckpointResult

if __name__ == "__main__":
    ctx = gx.get_context(context_root_dir=GX_CONTEXT_PATH)

    result: CheckpointResult = ctx.run_checkpoint(
            checkpoint_name="checkpoint-test",
            batch_request=None,
            run_name=None
    )
    with open("result.json", "w") as outfile:
        outfile.write(result.__str__())
    print(result)

The Stacktrace of the error received:

Calculating Metrics:   0%|          | 0/22 [00:00<?, ?it/s]
Calculating Metrics:   0%|          | 0/22 [00:00<?, ?it/s]
Calculating Metrics:   9%|▉         | 2/22 [00:00<00:01, 10.54it/s]
Calculating Metrics:   9%|▉         | 2/22 [00:00<00:01, 10.54it/s]
Calculating Metrics:   9%|▉         | 2/22 [00:00<00:01, 10.54it/s]
Calculating Metrics:  18%|█▊        | 4/22 [00:01<00:05,  3.19it/s]
Calculating Metrics:  18%|█▊        | 4/22 [00:01<00:05,  3.19it/s]
Calculating Metrics:  18%|█▊        | 4/22 [00:01<00:05,  3.19it/s]
Calculating Metrics:  36%|███▋      | 8/22 [00:02<00:03,  3.89it/s]
Calculating Metrics:  36%|███▋      | 8/22 [00:02<00:03,  3.89it/s]
Calculating Metrics:  36%|███▋      | 8/22 [00:02<00:03,  3.89it/s]
Calculating Metrics:  86%|████████▋ | 19/22 [00:05<00:00,  3.05it/s]
Calculating Metrics:  86%|████████▋ | 19/22 [00:05<00:00,  3.05it/s]
Calculating Metrics:  86%|████████▋ | 19/22 [00:05<00:00,  3.05it/s]
Calculating Metrics: 100%|██████████| 22/22 [00:07<00:00,  2.70it/s]
Calculating Metrics: 100%|██████████| 22/22 [00:07<00:00,  2.70it/s]
Calculating Metrics: 100%|██████████| 22/22 [00:07<00:00,  2.70it/s]
Calculating Metrics: 100%|██████████| 22/22 [00:07<00:00,  2.70it/s]
Calculating Metrics: 100%|██████████| 22/22 [00:07<00:00,  2.92it/s]
Traceback (most recent call last):
  File "/home/data_quality/great_expectations_executor.py", line 13, in <module>
    outfile.write(result.__str__())
  File "/home/appuser/gx_venv/lib/python3.10/site-packages/great_expectations/checkpoint/types/checkpoint_result.py", line 418, in __str__
    return self.__repr__()
  File "/home/appuser/gx_venv/lib/python3.10/site-packages/great_expectations/checkpoint/types/checkpoint_result.py", line 408, in __repr__
    return json.dumps(serializable_dict, indent=2)
  File "/usr/local/lib/python3.10/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "/usr/local/lib/python3.10/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/usr/local/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  [Previous line repeated 1 more time]
  File "/usr/local/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/local/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/local/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type date is not JSON serializable
null

Expected behavior The expected behaviour is that the after the successful execution of the checkpoint, the checkpoint result object would be printed. Instead it seems like the checkpoint result object cannot be serialized into a JSON.

Environment (please complete the following information):

Additional context I think I know the issue. We should be giving a default string value to the json.dumps method which will then convert the data type to string while serializing.

HaebichanGX commented 9 months ago

Hey @umawani thanks for sharing this issue. We'll put it into backlog for review. If you think you have a solution, please share with us here.

umawani commented 9 months ago

Hey @HaebichanGX , hope you're well. I think the solution would be to defaulting the values to string when they are not serializable by the json.dumps method to avoid non-serializable errors. Specifically, the change I'm talking about should look something like this:

great_expectations/checkpoint/types/checkpoint_result.py (line:410)

return json.dumps(serializable_dict, indent=2, default=str)

Let me know what you think about this change. If you think this change is fine, I can go ahead and create a PR for it?

Kilo59 commented 7 months ago

@umawani Have you tried updating to the latest gx version?

I would suggest updating to at least 0.18.2.

skate056 commented 5 months ago

I have a similar problem on gx version 0.18.7

umairMawaniSecuriti commented 5 months ago

Hey @HaebichanGX and @Kilo59

This issue still exists in 0.18.7. Please look into this.