When using the pandera.pyspark module, validation of a DataFrameSchema that uses Check.str_length() in a column level check generates NotImplementedError.
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandera.
When using the pandera.pyspark module, to be able to use Check.str_length() check when validating a Pyspark SQL dataframe against a DataFrameSchema object.
Environment
Azure Synapse Notebook
Browser: Edge
Python 3.10
Apache Spark 3.3
Pandera 0.16.1
Additional context
Really excited about the ability to use Pandera to validate big data on the Spark platform. Working on blog describing how to leverage this package in Azure Synapse and Microsoft Fabric.
Description of issue
When using the
pandera.pyspark
module, validation of aDataFrameSchema
that usesCheck.str_length()
in a column level check generatesNotImplementedError
.Code Sample
This generates the following output:
Expected behaviour
When using the
pandera.pyspark
module, to be able to useCheck.str_length()
check when validating a Pyspark SQL dataframe against aDataFrameSchema
object.Environment
Additional context
Really excited about the ability to use Pandera to validate big data on the Spark platform. Working on blog describing how to leverage this package in Azure Synapse and Microsoft Fabric.