uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k stars 284 forks source link

Added option for using PyTorch for throughput testing, issue 219 #444

Closed gregw18 closed 3 years ago

gregw18 commented 5 years ago

Added option for using PyTorch for throughput testing, using petastom.pytorch.DataLoader. Had to modify compat.py to return column.data.num_chunks regardless of pyarrow version, as 015 was crashing on column.num_chunks. Added option to pass min_after_retrieve to petastorm.pytorch.DataLoader, to give it similar functionality to TensorFlow. Note that I am just learning Python, so any and all feedback is appreciated!

CLAassistant commented 5 years ago

CLA assistant check
All committers have signed the CLA.

selitvin commented 4 years ago

Do we want to land this PR? If so, can you please rebase since some changes in this PR has landed already.

gregw18 commented 4 years ago

Whoops - thanks for catching that! I seem to be having some problems with the build, and they don't seem to be related to my code. I'm going to trigger another build Saturday morning to see if it will succeed then.