uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Use parquet.summary.metadata.level to control _summary file creation. #529

Closed selitvin closed 4 years ago

selitvin commented 4 years ago

Starting spark 2.4, "parquet.enable.summary-metadata" is deprecated and "parquet.summary.metadata.level" should be used to control creation of _summary file. Updated the code to set parquet.summary.metadata.level=ALL is petastorm use_summary_metadata=True and parquet.summary.metadata.level=NONE otherwise.

codecov[bot] commented 4 years ago

Codecov Report

Merging #529 into master will decrease coverage by 0.02%. The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #529      +/-   ##
==========================================
- Coverage   86.52%   86.50%   -0.03%     
==========================================
  Files          85       85              
  Lines        4699     4705       +6     
  Branches      740      741       +1     
==========================================
+ Hits         4066     4070       +4     
- Misses        515      516       +1     
- Partials      118      119       +1     
Impacted Files Coverage Δ
petastorm/unischema.py 95.79% <ø> (ø)
petastorm/etl/dataset_metadata.py 88.00% <71.42%> (-0.89%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d82aa28...c2f6dac. Read the comment docs.