unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.05k stars 281 forks source link

How to load schema from pyspark struct or avro format from schema registry ? #1599

Open pthalasta opened 3 weeks ago

pthalasta commented 3 weeks ago

Question about pandera

How do i create the DataFrameSchema using the avro schema? What are our options? If used, i see the DataFrameSchema object to have an empty column field. Can this be added as a feature that can help pull the schema from the registries that are most widely used?

cosmicBboy commented 2 weeks ago

Hi @pthalasta looking at the avro schema docs it looks like we'll need to write a translation layer between avro -> pandera, similar to the frictionless integration: https://pandera.readthedocs.io/en/stable/frictionless.html?highlight=frictionless#frictionless-data-schema

Feel free to change the label of this issue to enhancement and re-write the title as a feature request.

Happy to review a PR contribution from you or someone in the community!

pthalasta commented 3 hours ago

@cosmicBboy i'm not sure i can edit the label of the issue, but i can certainly change the description. Please let me know if that helps