uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Fix a failure when reading data from a parquet file (and not a parquet directory) #687

Closed selitvin closed 3 years ago

selitvin commented 3 years ago

Code was crashing since ParquetDataset's (pyarrow) partition_names attribute was set to None when the ParquetDataset object instance was created from a parquet file and not parquet directory.

codecov[bot] commented 3 years ago

Codecov Report

Merging #687 (edae9f9) into master (84f6558) will increase coverage by 0.04%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #687      +/-   ##
==========================================
+ Coverage   85.89%   85.93%   +0.04%     
==========================================
  Files          84       84              
  Lines        4956     4957       +1     
  Branches      788      788              
==========================================
+ Hits         4257     4260       +3     
+ Misses        560      558       -2     
  Partials      139      139              
Impacted Files Coverage Δ
petastorm/py_dict_reader_worker.py 95.27% <100.00%> (+0.03%) :arrow_up:
petastorm/reader.py 89.67% <0.00%> (+0.93%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 84f6558...edae9f9. Read the comment docs.