moj-analytical-services / etl_manager

A python package to create a database on the platform using our moj data warehousing framework
21 stars 8 forks source link

Add ability to auto-generate a TableMeta object from parquet metadata #123

Closed RobinL closed 4 years ago

RobinL commented 4 years ago

Want the ability to do

df = spark.read.parquet("path_to_parquet")
pmeta_json = df.schema.json()  
tab = tablemeta_from_parquet_meta(pmeta_json, name, location)

or

from pyarrow.parquet import ParquetFile
md = ParquetFile("test_nest.parquet").metadata
pmeta_json = md.metadata[b"org.apache.spark.sql.parquet.row.metadata"]
tab = tablemeta_from_parquet_meta(pmeta_json, name, location)