Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k
stars
284
forks
source link
Fix incorrect counting of number of row-groups per piece. #477
This bug manifests in either:
pyarrow.lib.ArrowIOError: The file only has <X> row groups, requested metadata for row group: <Y>
(whereThis issue occured when:
make_batch_reader
The issue was in introduced in petastorm 0.7.7
Resolves #447