issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
make_batch_reader + shuffling queue clarification
#705
brent-lemieux
closed
3 years ago
3
Use pyarrow.ipc.open_stream instead of legacy pyarrow.open_stream
#704
selitvin
closed
3 years ago
1
Fix deprecation warning of using '.data' attribute of ChunkedArray
#703
selitvin
closed
3 years ago
1
Remove very old pickle compatibility code modifying old atg package names
#702
selitvin
opened
3 years ago
2
Unbreak "C901 ... too complex" flake failures in CI
#701
selitvin
closed
3 years ago
1
Dataset API and pyarrow>=2.0
#700
v01dXYZ
closed
3 years ago
3
Access a specific row in the dataframe
#699
2006pmach
opened
3 years ago
4
Use pyarrow.fs.LocalFileSystem as per arrow 4.x
#698
JayjeetAtGithub
opened
3 years ago
2
Use py3.9 for all CI tests, except one that uses 3.7
#697
selitvin
closed
3 years ago
1
Python 3.9 isAlive -> is_alive compatibility fix
#696
selitvin
closed
3 years ago
1
Add python 3.9 to the dockerfile used in testing
#695
selitvin
closed
3 years ago
1
Extraction of storage_options from URL
#694
manjuransari-zz
closed
3 years ago
8
Added strip_protocol from fsspec to get the required URL.
#693
manjuransari-zz
closed
3 years ago
1
'WorkerThread' object has no attribute 'isAlive'
#692
jmpanfil
closed
3 years ago
1
Unable to extract storage_options from URL
#691
manjuransari-zz
closed
3 years ago
3
Support for parquet files with nested structures
#690
mossadhelali
opened
3 years ago
21
Predicting is slow and sometimes doesn't even work.
#689
diogoribeiro09
closed
3 years ago
12
Fix build badge on the README.rst page
#688
selitvin
closed
3 years ago
2
Fix a failure when reading data from a parquet file (and not a parquet directory)
#687
selitvin
closed
3 years ago
1
Allow opening parquet stores with unsupported types.
#686
selitvin
closed
3 years ago
1
Ignore unsupported fields in parquet dataset
#685
darkjh
closed
3 years ago
2
If there is no partition information, an error will be reported here
#684
blacksunshine
closed
3 years ago
6
CI: remove TravisCI
#683
chongxiaoc
closed
3 years ago
1
Update unittest.yml
#682
chongxiaoc
closed
3 years ago
1
Bump up to v0.11.0rc6
#681
chongxiaoc
closed
3 years ago
2
Check in Actions workflow
#680
chongxiaoc
closed
3 years ago
3
Fix some format
#679
chongxiaoc
closed
3 years ago
1
Actions: change target branch back to master
#678
chongxiaoc
closed
3 years ago
0
tf_dataset: enable repeat() with warnings
#677
chongxiaoc
closed
3 years ago
2
Make "Generate Dataset" example work with newer pyspark
#676
selitvin
closed
3 years ago
1
ValueError: Cell is empty
#675
leonardozcm
opened
3 years ago
4
tf_dataset: add unit test to verify repeat() works after cache()
#674
chongxiaoc
closed
3 years ago
1
[Tensorflow] Support tf.dataset.repeat() to avoid duplicating and dropping samples in one epoch with shuffle?
#673
chongxiaoc
opened
3 years ago
0
[ISSUE] Petastorm + TF Recommenders hangs forever
#672
renardeinside
opened
3 years ago
6
`parquet file size 0 bytes` when materializing dataset
#671
ckchow
opened
3 years ago
4
Facelift to Dockerfile
#670
selitvin
closed
3 years ago
2
Pytorch: add inmemory batched dataloader
#669
chongxiaoc
closed
3 years ago
4
Better support for Spark image type
#668
RobindeGrootNL
closed
3 years ago
6
Bump up to 0.10.0rc4
#667
chongxiaoc
closed
3 years ago
1
Test Travis CI with py3.6
#666
chongxiaoc
closed
3 years ago
1
Replaced s3 and gcs connectors with fsspec to support additional filesystems
#665
tgaddair
closed
3 years ago
3
Refactor inmem cache out of BatchedDataLoader, create an inmem dataloader instead?
#664
chongxiaoc
opened
3 years ago
2
fix get_dataset_path() in fs_utils.py
#663
dongpohezui
opened
3 years ago
4
Reader: shuffle row groups before sharding.
#662
chongxiaoc
closed
3 years ago
4
Fixing Dockerfile
#661
selitvin
closed
3 years ago
1
CI: upgrade pyarrow to 2.0.0
#660
selitvin
closed
3 years ago
1
Fixate numpy version used in build
#659
selitvin
closed
3 years ago
1
Fix mypy test: removing np.bool from tf_utils numpy-to-tf dtype conversion
#658
selitvin
closed
3 years ago
0
Fixing mypy errors: SyntheticDataset nametuple name
#657
selitvin
closed
3 years ago
0
Remove Unischema __getattr__ implementation
#656
v01dXYZ
opened
3 years ago
2
Previous
Next