issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
training from different sources
#755
weidezhang
opened
2 years ago
6
Wrapper for Arrow Datasets & Dataset Pieces
#754
aperiodic
opened
2 years ago
2
Update README.rst
#753
FeU-aKlos
opened
2 years ago
1
Add Python3.10 to CI docker image
#752
selitvin
opened
2 years ago
2
Upgrade CI to use latest packages of tf,pyarrow,numpy in 'latest' CI configuration
#751
selitvin
closed
2 years ago
2
Fix type of the a batch returned by make_batch_reader when TransformSpec's function returns column with all values being None
#750
selitvin
opened
2 years ago
1
Do not land: Benchmark size of a parquet file with png files
#749
selitvin
closed
2 years ago
0
Enable batch fetching in parallel
#748
jarandaf
opened
2 years ago
4
How to reduce parquet size
#747
journey-wang
opened
2 years ago
1
Import ABC from collections.abc for Python 3.10 compatibility
#746
tirkarthi
closed
2 years ago
2
Test using shared_seed with pytorch converter
#745
selitvin
closed
2 years ago
1
Use of transform_spec in make_batch_reader leads to tensorflow error when column is missing values
#744
oby1
opened
2 years ago
3
tensorflow pyspark
#743
malinphy
closed
2 years ago
4
make_batch_reader called by make_torch_loader "got an unexpected keyword argument 'shard_seed'"
#742
quocdat32461997
closed
1 year ago
2
`RestrictedUnpickler` is Bypassable
#741
splitline
opened
2 years ago
0
On BatchedDataLoader performance
#740
jarandaf
closed
2 years ago
8
Speeding up loading data from spark
#739
jmpanfil
opened
2 years ago
3
Ambiguous workflow while using Spark
#738
smartFunX
opened
2 years ago
3
Use highest available pickle protocol when serializing
#737
rbetz
closed
2 years ago
9
Parquet column/modular encryption support for Petastorm
#736
RobindeGrootNL
opened
2 years ago
8
reuse dataset materialized by SparkDatasetConverter
#735
Riser01
closed
2 years ago
1
how to use a single dataset to train multiple input model in tensorflow keras useing pentastorm
#734
Riser01
closed
2 years ago
0
Tensorflow pentastrom , training stuck
#733
Riser01
closed
2 years ago
6
Get rid of RuntimeWarning when using process pool
#732
selitvin
closed
2 years ago
1
Support passing multiple url files to make_reader function.
#731
selitvin
closed
2 years ago
3
Allow more than two namenodes in hdfs configuration file.
#730
selitvin
closed
2 years ago
1
Varying number of examples passed by DataLoader to Pytorch Lightning network
#729
trelium
opened
2 years ago
2
PyDictReaderWorker does not support multiple paths datset_paths
#728
zhangzhenyu13
closed
2 years ago
2
Large metadata file: Can't load dataset after using Petastorm row_group_indexer
#727
marjanalbooyeh
opened
2 years ago
1
How to stop petastorm dataloaders at end of epoch
#726
jiwidi
opened
2 years ago
3
Error when using make_spark_converter
#725
jiwidi
closed
2 years ago
0
got error AssertionError: Must supply a list of namenodes, but HDFS only supports up to 2 namenode URLs when calling the materialize_dataset() in example
#724
Ereebay
closed
2 years ago
3
Use assertEqual instead of assertEquals for Python 3.11 compatibility.
#723
tirkarthi
closed
2 years ago
2
fix typo "suffling" -> "shuffling"
#722
noxthot
closed
2 years ago
5
not able to disable shuffling using : make_torch_dataloader
#721
Warra07
opened
3 years ago
2
make_reader() is taking forever
#720
GraceHLiu
opened
3 years ago
14
Any update on shard imbalance issue for parquet dataset?
#719
PHILO-HE
closed
2 years ago
6
Use make_batch_reader for petastorm parquet dataset
#718
PHILO-HE
closed
3 years ago
2
Added fsspec support for _default_delete_dir_handler
#717
manjuransari-zz
closed
3 years ago
1
_default_delete_dir_handler throws error when using default handler
#716
manjuransari-zz
closed
3 years ago
8
Fix package versions in Dockerfile
#715
chongxiaoc
closed
3 years ago
1
No option to pass storage_options in materialize_dataset()
#714
manjuransari-zz
opened
3 years ago
0
Petastorm C++ API
#713
kydonian
closed
3 years ago
4
ModuleNotFoundError: No module named 'petastorm.codecs'; 'petastorm' is not a package
#712
aseembits93
closed
3 years ago
4
Use spark_test_ctx fixture instead of constructing spark manually
#711
selitvin
opened
3 years ago
1
How to get length of dataset when using pytorch dataloader API?
#710
aseembits93
closed
3 years ago
3
Improve a docstring of petastorm.pytorch.DataLoader class.
#709
selitvin
closed
3 years ago
1
Do not attempt to override pyarrow's dataset.partitions attribute
#708
selitvin
closed
3 years ago
1
pyarrow v.5.0.0 breaks Petastorm Reader
#707
hgalbraith-bb
closed
3 years ago
3
Add draft_release Actions
#706
chongxiaoc
closed
3 years ago
2
Previous
Next