uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Deprecating `pyarrow_serialize` argument of `petastorm.make_reader`. #617

Closed selitvin closed 3 years ago

selitvin commented 3 years ago

pyarrow serialization is no longer suppoted. It actually has no merit as python3's pickle is faster then pyarrow serialization with numpy arrays.

codecov[bot] commented 3 years ago

Codecov Report

Merging #617 (ac9356e) into master (fecef0e) will decrease coverage by 0.04%. The diff coverage is 50.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #617      +/-   ##
==========================================
- Coverage   85.35%   85.31%   -0.05%     
==========================================
  Files          86       85       -1     
  Lines        4951     4929      -22     
  Branches      785      783       -2     
==========================================
- Hits         4226     4205      -21     
+ Misses        585      584       -1     
  Partials      140      140              
Impacted Files Coverage Δ
petastorm/benchmark/cli.py 0.00% <ø> (ø)
petastorm/workers_pool/process_pool.py 92.70% <ø> (ø)
petastorm/benchmark/throughput.py 80.37% <50.00%> (-0.19%) :arrow_down:
petastorm/reader.py 89.32% <50.00%> (-1.02%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update fecef0e...ac9356e. Read the comment docs.