Currently, our Parquet validation process does not support Avro schemas (AVSC files). To enhance our data validation capabilities, we need to implement support for validating Parquet files using Avro schemas.
AC:
Implement a Parquet validation process that accepts Avro schema files (AVSC) as input. There is a todo in src/data_toolset/utils/parquet.py
Ensure that the validation process can validate Parquet files against the provided Avro schema.
Make sure it supports complex and nested data structures
If a Parquet file doesn't conform to the Avro schema, the validation process should identify and report schema violations or data inconsistencies.
Integrate the Avro schema validation process into the tool
It should support existing data sample files from the tests/data directory
Write unit and integration tests to validate the correctness of the function.
Integrate the function into the interface
Ensure the function is easy to use with a clear and well-documented API.
Currently, our Parquet validation process does not support Avro schemas (AVSC files). To enhance our data validation capabilities, we need to implement support for validating Parquet files using Avro schemas.
AC: