uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.78k stars 285 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

training from different sources

#755 weidezhang opened 2 years ago
6
Wrapper for Arrow Datasets & Dataset Pieces

#754 aperiodic opened 2 years ago
2
Update README.rst

#753 FeU-aKlos opened 2 years ago
1
Add Python3.10 to CI docker image

#752 selitvin opened 2 years ago
2
Upgrade CI to use latest packages of tf,pyarrow,numpy in 'latest' CI configuration

#751 selitvin closed 2 years ago
2
Fix type of the a batch returned by make_batch_reader when TransformSpec's function returns column with all values being None

#750 selitvin opened 2 years ago
1
Do not land: Benchmark size of a parquet file with png files

#749 selitvin closed 2 years ago
0
Enable batch fetching in parallel

#748 jarandaf opened 2 years ago
4
How to reduce parquet size

#747 journey-wang opened 2 years ago
1
Import ABC from collections.abc for Python 3.10 compatibility

#746 tirkarthi closed 2 years ago
2
Test using shared_seed with pytorch converter

#745 selitvin closed 2 years ago
1
Use of transform_spec in make_batch_reader leads to tensorflow error when column is missing values

#744 oby1 opened 2 years ago
3
tensorflow pyspark

#743 malinphy closed 2 years ago
4
make_batch_reader called by make_torch_loader "got an unexpected keyword argument 'shard_seed'"

#742 quocdat32461997 closed 1 year ago
2
`RestrictedUnpickler` is Bypassable

#741 splitline opened 2 years ago
0
On BatchedDataLoader performance

#740 jarandaf closed 2 years ago
8
Speeding up loading data from spark

#739 jmpanfil opened 2 years ago
3
Ambiguous workflow while using Spark

#738 smartFunX opened 2 years ago
3
Use highest available pickle protocol when serializing

#737 rbetz closed 2 years ago
9
Parquet column/modular encryption support for Petastorm

#736 RobindeGrootNL opened 2 years ago
8
reuse dataset materialized by SparkDatasetConverter

#735 Riser01 closed 2 years ago
1
how to use a single dataset to train multiple input model in tensorflow keras useing pentastorm

#734 Riser01 closed 2 years ago
0
Tensorflow pentastrom , training stuck

#733 Riser01 closed 2 years ago
6
Get rid of RuntimeWarning when using process pool

#732 selitvin closed 2 years ago
1
Support passing multiple url files to make_reader function.

#731 selitvin closed 2 years ago
3
Allow more than two namenodes in hdfs configuration file.

#730 selitvin closed 2 years ago
1
Varying number of examples passed by DataLoader to Pytorch Lightning network

#729 trelium opened 2 years ago
2
PyDictReaderWorker does not support multiple paths datset_paths

#728 zhangzhenyu13 closed 2 years ago
2
Large metadata file: Can't load dataset after using Petastorm row_group_indexer

#727 marjanalbooyeh opened 2 years ago
1
How to stop petastorm dataloaders at end of epoch

#726 jiwidi opened 2 years ago
3
Error when using make_spark_converter

#725 jiwidi closed 2 years ago
0
got error AssertionError: Must supply a list of namenodes, but HDFS only supports up to 2 namenode URLs when calling the materialize_dataset() in example

#724 Ereebay closed 2 years ago
3
Use assertEqual instead of assertEquals for Python 3.11 compatibility.

#723 tirkarthi closed 2 years ago
2
fix typo "suffling" -> "shuffling"

#722 noxthot closed 2 years ago
5
not able to disable shuffling using : make_torch_dataloader

#721 Warra07 opened 3 years ago
2
make_reader() is taking forever

#720 GraceHLiu opened 3 years ago
14
Any update on shard imbalance issue for parquet dataset?

#719 PHILO-HE closed 2 years ago
6
Use make_batch_reader for petastorm parquet dataset

#718 PHILO-HE closed 3 years ago
2
Added fsspec support for _default_delete_dir_handler

#717 manjuransari-zz closed 3 years ago
1
_default_delete_dir_handler throws error when using default handler

#716 manjuransari-zz closed 3 years ago
8
Fix package versions in Dockerfile

#715 chongxiaoc closed 3 years ago
1
No option to pass storage_options in materialize_dataset()

#714 manjuransari-zz opened 3 years ago
0
Petastorm C++ API

#713 kydonian closed 3 years ago
4
ModuleNotFoundError: No module named 'petastorm.codecs'; 'petastorm' is not a package

#712 aseembits93 closed 3 years ago
4
Use spark_test_ctx fixture instead of constructing spark manually

#711 selitvin opened 3 years ago
1
How to get length of dataset when using pytorch dataloader API?

#710 aseembits93 closed 3 years ago
3
Improve a docstring of petastorm.pytorch.DataLoader class.

#709 selitvin closed 3 years ago
1
Do not attempt to override pyarrow's dataset.partitions attribute

#708 selitvin closed 3 years ago
1
pyarrow v.5.0.0 breaks Petastorm Reader

#707 hgalbraith-bb closed 3 years ago
3
Add draft_release Actions

#706 chongxiaoc closed 3 years ago
2

Previous Next