Closed damczyk closed 1 month ago
Before this error I used Data Assistant to create an expectation suite for
control_tower_ccu_datasources_godistributiondates.json
During the profiling process the code exited with an error. Unfortunately I do not remember which error this was. But seemed to be a resource problem.
Now it seems that at some place GX still has a handle on file /dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations/del/control_tower_ccu_datasources_godistributiondates.json
When I put a file named control_tower_ccu_datasources_godistributiondates.json
in location /dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations/del/
at least the other expectations in my other notebooks can be executed.
This works as a workaround.
So the problem is now reduced to the question how to get rid of the handle to /dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations/del/control_tower_ccu_datasources_godistributiondates.json
?
Terminating and restarting the cluster also with different gx libs (from great_expectations[azure_secrets]==0.18.10
to great_expectations[azure_secrets]==0.18.12
) does not have any effect.
Code for executing the Data Assistant to create the expectation suite was:
try:
# Create or load great expectations context
context = GX_Context(context_root_dir, context_connection_string).get_gx_context()
# Create batch request
batch_request = (context
.sources
.add_or_update_spark(name=data_source_name)
.add_dataframe_asset(name=data_asset_name, dataframe=df_cleansing)
.build_batch_request()
)
# Profiler
# Run the default onboarding profiler on the batch request
onboarding_data_assistant_result = (context
.assistants
.onboarding
.run(
batch_request=batch_request,
exclude_column_names=[],
estimation="exact"
)
)
# Get the expectation suite from the onboarding result
onboarding_suite = (onboarding_data_assistant_result
.get_expectation_suite(
expectation_suite_name=onboarding_suite_name
)
)
# Perist expectation suite with the specified suite name from above
context.add_or_update_expectation_suite(expectation_suite=onboarding_suite)
# Create and persist checkpoint to reuse for multiple batches
context.add_or_update_checkpoint(
name=onboarding_checkpoint_name,
batch_request=batch_request,
expectation_suite_name=onboarding_suite_name,
)
# Run Onboarding checkpoint
control_tower_ccu_datasources_block_checkpoint_result = context.get_checkpoint(onboarding_checkpoint_name).run(run_name=onboarding_checkpoint_name)
# Check the validation result
if control_tower_ccu_datasources_block_checkpoint_result.success:
print("The validation succeeded")
else:
dbutils.notebook.exit("The validation failed : " + control_tower_ccu_datasources_block_checkpoint_result["run_results"][list(control_tower_ccu_datasources_block_checkpoint_result["run_results"].keys())[0]]["actions_results"]["update_data_docs"]["az_site"])
except Exception as exception:
handle_exception(exception, dbutils.notebook.entry_point.getDbutils().notebook().getContext())
raise exception
Hello @damczyk. With the launch of Great Expectations Core (GX 1.0), we are closing old issues posted regarding previous versions. Moving forward, we will focus our resources on supporting and improving GX Core (version 1.0 and beyond). If you find that an issue you previously reported still exists in GX Core, we encourage you to resubmit it against the new version. With more resources dedicated to community support, we aim to tackle new issues swiftly. For specific details on what is GX-supported vs community-supported, you can reference our integration and support policy.
To get started on your transition to GX Core, check out the GX Core quickstart (click “Full example code” tab to see a code example).
You can also join our upcoming community meeting on August 28th at 9am PT (noon ET / 4pm UTC) for a comprehensive rundown of everything GX Core, plus Q&A as time permits. Go to https://greatexpectations.io/meetup and click “follow calendar” to follow the GX community calendar.
Thank you for being part of the GX community and thank you for submitting this issue. We're excited about this new chapter and look forward to your feedback on GX Core. 🤗
Describe the bug Since yesterday running the GX code
context.get_checkpoint(onboarding_checkpoint_name).run(run_name=onboarding_checkpoint_name, batch_request=batch_request)
leads to the errorIt worked yesterday until about 4 p.m. (CET) Since then I get the error. Since then GX tries to get the JSON for the expectation suite from a sub-directory /del/, which does not exist /dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations/del/control_tower_ccu_datasources_godistributiondates.json
The JSON control_tower_ccu_datasources_godistributiondates.json is stored in dir /dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations (without /del/) and has always been the days before.
To Reproduce Please include your great_expectations.yml config, the code you’re executing that causes the issue, and the full stack trace of any error(s). I can only give you my context.json because I'm on Databricks using Ephemeral Data Context. I started my Databricks cluster using Library great_expectations[azure_secrets]==0.18.12 (and also on great_expectations[azure_secrets]==0.18.10)
I started a Databricks notebook with
import great_expectations as gx
besides other libraries and then I build my dataframe object with my data calleddf_Staging
. In one cell of my notebook all GX code is executed like this:Class GX_Context with method get_gx_context() looks like this:
Expected behavior A clear and concise description of what you expected to happen.
Executing checkpoint
context.get_checkpoint(onboarding_checkpoint_name).run(run_name=onboarding_checkpoint_name, batch_request=batch_request)
reads the expectation suite from/dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations/control_tower_ccu_datasources_godistributiondates.json
as the days before and not from/dbfs/mnt/sdl/control-tower-ccu/DataQuality/GX/expectations/**del/**control_tower_ccu_datasources_godistributiondates.json
and delivers a result instead of an errorEnvironment (please complete the following information):
Additional context Add any other context about the problem here.