Closed SiddhantSadangi closed 2 months ago
hi there, thank you for bringing this to our attention. I was able to replicate this error even when removing the checkpoint.. seems something else is causing batch creation to fail. I will continue to take a look into this and have escalated it as well -- check back with you soon
Okay it looks like we will have to work on getting the error to be a little more helpful here.
I've made a few edits to the provided file, very minimal - the biggest one adding this line:
validation_results = validation_definition.run(batch_parameters=batch_parameters)
A validation_definition.run() command needs to be present and needs to know what batch to run against by specifying batch_parameters
on the validation_definition.run
method
This should solve your issue:
import great_expectations as gx # type: ignore
import pandas as pd # type: ignore
df = pd.read_csv(
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)
context = gx.get_context(mode="file")
print(context)
data_source = context.data_sources.add_pandas("pandas")
data_asset = data_source.add_dataframe_asset(name="pd dataframe asset")
batch_definition = data_asset.add_batch_definition_whole_dataframe(
"batch-def"
)
batch_definition = (
context.data_sources.get("pandas").get_asset("pd dataframe asset")
.get_batch_definition("batch-def")
)
batch_parameters = {"dataframe": df}
batch = batch_definition.get_batch(batch_parameters=batch_parameters)
suite = gx.ExpectationSuite(name="expectation_suite-4")
suite = context.suites.add(suite)
suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeBetween(
column="passenger_count", min_value=1, max_value=6
)
)
suite.add_expectation(
gx.expectations.ExpectColumnValuesToBeBetween(column="fare_amount", min_value=0)
)
suite.add_expectation(
gx.expectations.ExpectColumnValuesToNotBeNull(column="pickup_datetime")
)
definition_name = "validation_definition-4"
validation_definition = gx.ValidationDefinition(
data=batch_definition, suite=suite, name=definition_name
)
validation_results = validation_definition.run(batch_parameters=batch_parameters)
print(validation_results)
Please update the documentation here: https://docs.greatexpectations.io/docs/core/run_validations/run_a_validation_definition
Updated ValidationDefinition API docs
Expected behavior Checkpoint run without any issues
CC @SiddhantSadangi
I have encountered the same issue before and found the solution, I hope this will help, cheers!
Solution
"""
add the batch_parameters when calling checkpoint.run(...)
during the checkpoint.run(...) execution,
it will call validation_definition.run(...) inside it as described above
"""
df: pandas.DataFrame = ...
batch_parameters = {'dataframe': df}
checkpoint_result = checkpoint.run(batch_parameters=batch_parameters)
Hey @chrishartono , @adeola-ak Thanks, I'll check the workarounds and let you know if it works 🤗
Sorry for the delay here, but I was finally able to test it. Works for me now ✅
Describe the bug Cannot run a checkpoint on a validation suite when using a dataframe asset
To Reproduce Code:
Stack trace:
Expected behavior Checkpoint run without any issues
Environment (please complete the following information):
Additional context Add any other context about the problem here.