uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.78k stars 285 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Petastorm not working due to PyArrow version hell

#806 kiranzo opened 1 month ago
2
Petastorm break with pyarrow 13.0 or newer. Stable version of pyarrow is at 16.0 now.

#805 LauritsDixen opened 5 months ago
2
Petastorm hangs forever in DataBricks

#804 juzzmac opened 7 months ago
1
ParquetDataset has an invalid parameter validate_schema

#803 ayushkarnawat opened 8 months ago
1
chore: Update badge pipeline

#802 Juandavi1 closed 10 months ago
1
make_reader fails for example

#801 phK3 closed 11 months ago
1
FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version.

#800 ton11111 opened 11 months ago
1
make_torch_dataloader using TransformSpec applies transformation on entire dataframe (not lazy loading)

#799 davegabe closed 1 year ago
2
Bug in ConcurrentVentilator._ventilate() when randomize_item_order=True and random seed is fixed

#798 JonasRauch opened 1 year ago
0
Issue with loading nested array type from spark DF to torch

#797 sardinois opened 1 year ago
0
Add a ThreadPool which respects the order of Parquet dataset pieces.

#796 wbeardall opened 1 year ago
3
String as input in petastorm dataloaders

#795 freud14-tm opened 1 year ago
3
Seeing worse model performance from using petastorm vs normal pytorch dataloader

#793 AKhazane opened 1 year ago
1
Add missing field_name in ValueError

#792 chasleslr opened 1 year ago
3
[Test] Run CI against pyspark 3.4

#791 WeichenXu123 opened 1 year ago
3
TypeError: __init__() missing 2 required positional arguments: 'instance' and 'token'

#790 devVipin01 opened 1 year ago
0
AttributeError: 'bool' object has no attribute 'map' while using Predicate

#789 littlehomelessman opened 1 year ago
0
How to transform the string data to numerical when using make_batch_reader?

#788 littlehomelessman opened 1 year ago
0
Make `make_spark_converter` supports creating converter from a saved dataframe path

#787 WeichenXu123 closed 1 year ago
7
make_batch_reader Documentation out of date? seed?

#786 Data-drone opened 1 year ago
0
Petastorm sharding and setting batch sizes

#785 Data-drone opened 1 year ago
0
Prediction issue using Keras and TransformSpec with PySpark

#784 sdaza closed 1 year ago
0
Support results_queue_size parameter in make_batch_reader api

#783 s-udhaya closed 1 year ago
8
when hdfs-site.xml file has xi:include tag, the function cann't get hadoop_configuration info

#782 lytk01 opened 1 year ago
0
How to pass pin_memory argument when using make_torch_dataloader

#781 s-udhaya closed 1 year ago
2
Customized dataset

#780 JiajianLu closed 1 year ago
1
Random seed doesn't seem to work well

#779 kisel4363 opened 2 years ago
2
Update CI to use latest versions of pyarrow and numpy. Drop pyarrow 4 test config.

#778 selitvin opened 2 years ago
2
Remove ``LocalDiskArrowTableCache`` and use latest pickle protocol for local caching

#777 selitvin closed 2 years ago
3
using SHAP with petastorm dataset

#776 sdaza opened 2 years ago
1
Future Warning importing SparkDatasetConverter.

#775 kisel4363 closed 2 years ago
2
Dynamic shape of lables.

#774 ohindialign opened 2 years ago
3
in_set predicate raises error unhashable type: 'Series'

#773 Joachim-Sh opened 2 years ago
0
Add a collate_lists_fn

#772 selitvin opened 2 years ago
1
Update pytorch mnist example with up-to-date make_reader() interface

#771 chongxiaoc closed 2 years ago
1
weighted_sampling_reader

#770 weidezhang opened 2 years ago
3
make_spark_converter RuntimeError: Vector columns are only supported in pyspark>=3.0

#769 Alxe1 opened 2 years ago
4
null cache

#768 weidezhang opened 2 years ago
4
Reader: enable shuffling inside every row group

#767 chongxiaoc closed 2 years ago
2
upgrade readthedocs to use Py3.7

#766 chongxiaoc closed 2 years ago
1
make_batch_reader loses dtype with list-of-strings columns, causing Tensorflow error when lists contain a None value

#765 arhan-gunel opened 2 years ago
0
Will petastorm Dataloader support prefetch like PyTorch Multiprocessing Dataloader?

#764 MARD1NO closed 2 years ago
1
PyTorch Batched Non-shuffle Buffer Large Memory Consumption

#763 chongxiaoc closed 2 years ago
1
PyTorch: improve memory-efficiency in batched non-shuffle buffer

#762 chongxiaoc closed 2 years ago
3
dynamic padding via `collate_fn`

#761 Jomonsugi opened 2 years ago
11
Newer pyarrow versions?

#760 winding-lines closed 1 year ago
1
Can we input a custom collate function as an input variable when creating the dataloader ?

#759 shamanez opened 2 years ago
0
Validate_schema keyword not supported yet

#758 kisel4363 opened 2 years ago
7
Replace process_iter by pid_exists

#757 MostafaFarahani closed 2 years ago
3
Performance on large amounts of data

#756 jaycunningham-8451 opened 2 years ago
1