Pyspark module - str_length check not implemented.

Description of issue

When using the pandera.pyspark module, validation of a DataFrameSchema that uses Check.str_length() in a column level check generates NotImplementedError.

[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of pandera.

Code Sample

from pandera.pyspark import DataFrameSchema, Column, Check
from pyspark.sql.types import StringType, IntegerType, DateType, FloatType

dataframe_schema_string_length = DataFrameSchema(
    columns={
        "index": Column(
            dtype=IntegerType,
        ),
        "participant_id": Column(
            dtype=StringType,
            checks=[
                Check.str_length(32)
            ],
        ),
    },
    coerce=True,
    strict=False,
)

df_to_validate = spark.createDataFrame(
    [
        (1, "ee584ba55112f89ec9d5a7cabd52f705"),
        (2, "fcfd946100c0147583b63b6789dc0252"),
        (3, "091631f6c8bdf72e7c55d4d91b874c43"),
        (4, "433cb57085b9b4f3f268e655108b637d"),
        (5, "60d6aad80845ff205936d1ff4b290f00"),
        ],
    ["index", "participant_id"])

dataframe_schema_string_length.validate(df_to_validate).pandera.errors

This generates the following output:

defaultdict(<function pandera.api.pyspark.error_handler.ErrorHandler.__init__.<locals>.<lambda>()>,
            {'DATA': defaultdict(list,
                         {'CHECK_ERROR': [{'schema': None,
                            'column': 'participant_id',
                            'check': 'str_length(32, None)',
                            'error': 'Error while executing check function: NotImplementedError ...'}]})})

Expected behaviour

When using the pandera.pyspark module, to be able to use Check.str_length() check when validating a Pyspark SQL dataframe against a DataFrameSchema object.

Environment

Azure Synapse Notebook
Browser: Edge
Python 3.10
Apache Spark 3.3
Pandera 0.16.1

Additional context

Really excited about the ability to use Pandera to validate big data on the Spark platform. Working on blog describing how to leverage this package in Azure Synapse and Microsoft Fabric.

unionai-oss / pandera