uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k stars 284 forks source link

Pytorch: add inmemory batched dataloader #669

Closed chongxiaoc closed 3 years ago

chongxiaoc commented 3 years ago

Including two changes:

Related to #664

codecov[bot] commented 3 years ago

Codecov Report

Merging #669 (381af4f) into master (4b08422) will increase coverage by 0.17%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #669      +/-   ##
==========================================
+ Coverage   85.72%   85.89%   +0.17%     
==========================================
  Files          84       84              
  Lines        4931     4956      +25     
  Branches      779      788       +9     
==========================================
+ Hits         4227     4257      +30     
+ Misses        564      560       -4     
+ Partials      140      139       -1     
Impacted Files Coverage Δ
petastorm/reader_impl/pytorch_shuffling_buffer.py 96.42% <ø> (+3.62%) :arrow_up:
petastorm/pytorch.py 93.57% <100.00%> (+1.35%) :arrow_up:
petastorm/reader.py 88.73% <0.00%> (-0.94%) :arrow_down:
petastorm/spark/spark_dataset_converter.py 91.24% <0.00%> (+0.72%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4b08422...381af4f. Read the comment docs.

chongxiaoc commented 3 years ago

TODO: mention the class in the main Readme and release notes

chongxiaoc commented 3 years ago

@selitvin ready to be reviewed again.

chongxiaoc commented 3 years ago

Addressed new comments. PTAL one more time. @selitvin