Closed anthonyburdi closed 3 years ago
Hello,
I'm getting the same error running checkpoint using a spark dataframe and RuntimeDataConnector. I've got this error running on Databricks (as well as @anthonyburdi ), but I'm also getting the same result running on my local machine (on Jupyter notebook).
Here is the notebook in case it helps: great_expectations_poc.zip
I believe the error is persisting. I installed from 0.13.34 but I still get TypeError: cannot pickle '_thread.RLock' object
when passing a spark dataframe as batch_data.
I concur with @jvetu. I am running code very similar to @wesleyfelipe, but using a spark data frame. I get the same error. I'm running version 0.13.35.
Would that be considered a separate issue?
Hi @wesleyfelipe @jvetu @davidmaddox-saic! This issue has been addressed by PR #3502 and will be included in release 0.13.38.
Hi @NathanFarmer, I'm using version 0.13.42 and still get this error when trying to run a checkpoint on a spark data frame.
checkpoint_run_result = context.run_checkpoint(
checkpoint_name="my_checkpoint",
batch_request={
"runtime_parameters": {
"batch_data": df,
}
},
run_name="Hello",
)
Hi @alit8, using RuntimeBatchRequest
s in Checkpoint
s is under development in the current sprint. You should be able to get around this for now by passing the batch request into a Checkpoint
object using validations
. Open PR #3680 addresses passing the RuntimeBatchRequest
into a Checkpoint
object instead of into context.run_checkpoint
.
Describe the bug Configuring and running a checkpoint from within a DataBricks notebook causes an error. It seems that the error is caused by serialization using pickle during a deepcopy step from within the Checkpoint config.
Here is part of a traceback from within a notebook which configures and then runs a checkpoint:
To Reproduce Steps to reproduce the behavior:
checkpoint.run()
orcontext.run_checkpoint()
Expected behavior I expect the checkpoint to run with all configured validation actions.
Environment (please complete the following information):
Additional context Validation is run using a databricks dataframe as the datasource and a RuntimeDataConnector to connect to the dataframe.