issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Petastorm not working due to PyArrow version hell
#806
kiranzo
opened
1 month ago
2
Petastorm break with pyarrow 13.0 or newer. Stable version of pyarrow is at 16.0 now.
#805
LauritsDixen
opened
5 months ago
2
Petastorm hangs forever in DataBricks
#804
juzzmac
opened
7 months ago
1
ParquetDataset has an invalid parameter validate_schema
#803
ayushkarnawat
opened
8 months ago
1
chore: Update badge pipeline
#802
Juandavi1
closed
10 months ago
1
make_reader fails for example
#801
phK3
closed
11 months ago
1
FutureWarning: 'ParquetDataset.partitions' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version.
#800
ton11111
opened
11 months ago
1
make_torch_dataloader using TransformSpec applies transformation on entire dataframe (not lazy loading)
#799
davegabe
closed
1 year ago
2
Bug in ConcurrentVentilator._ventilate() when randomize_item_order=True and random seed is fixed
#798
JonasRauch
opened
1 year ago
0
Issue with loading nested array type from spark DF to torch
#797
sardinois
opened
1 year ago
0
Add a ThreadPool which respects the order of Parquet dataset pieces.
#796
wbeardall
opened
1 year ago
3
String as input in petastorm dataloaders
#795
freud14-tm
opened
1 year ago
3
Seeing worse model performance from using petastorm vs normal pytorch dataloader
#793
AKhazane
opened
1 year ago
1
Add missing field_name in ValueError
#792
chasleslr
opened
1 year ago
3
[Test] Run CI against pyspark 3.4
#791
WeichenXu123
opened
1 year ago
3
TypeError: __init__() missing 2 required positional arguments: 'instance' and 'token'
#790
devVipin01
opened
1 year ago
0
AttributeError: 'bool' object has no attribute 'map' while using Predicate
#789
littlehomelessman
opened
1 year ago
0
How to transform the string data to numerical when using make_batch_reader?
#788
littlehomelessman
opened
1 year ago
0
Make `make_spark_converter` supports creating converter from a saved dataframe path
#787
WeichenXu123
closed
1 year ago
7
make_batch_reader Documentation out of date? seed?
#786
Data-drone
opened
1 year ago
0
Petastorm sharding and setting batch sizes
#785
Data-drone
opened
1 year ago
0
Prediction issue using Keras and TransformSpec with PySpark
#784
sdaza
closed
1 year ago
0
Support results_queue_size parameter in make_batch_reader api
#783
s-udhaya
closed
1 year ago
8
when hdfs-site.xml file has xi:include tag, the function cann't get hadoop_configuration info
#782
lytk01
opened
1 year ago
0
How to pass pin_memory argument when using make_torch_dataloader
#781
s-udhaya
closed
1 year ago
2
Customized dataset
#780
JiajianLu
closed
1 year ago
1
Random seed doesn't seem to work well
#779
kisel4363
opened
2 years ago
2
Update CI to use latest versions of pyarrow and numpy. Drop pyarrow 4 test config.
#778
selitvin
opened
2 years ago
2
Remove ``LocalDiskArrowTableCache`` and use latest pickle protocol for local caching
#777
selitvin
closed
2 years ago
3
using SHAP with petastorm dataset
#776
sdaza
opened
2 years ago
1
Future Warning importing SparkDatasetConverter.
#775
kisel4363
closed
2 years ago
2
Dynamic shape of lables.
#774
ohindialign
opened
2 years ago
3
in_set predicate raises error unhashable type: 'Series'
#773
Joachim-Sh
opened
2 years ago
0
Add a collate_lists_fn
#772
selitvin
opened
2 years ago
1
Update pytorch mnist example with up-to-date make_reader() interface
#771
chongxiaoc
closed
2 years ago
1
weighted_sampling_reader
#770
weidezhang
opened
2 years ago
3
make_spark_converter RuntimeError: Vector columns are only supported in pyspark>=3.0
#769
Alxe1
opened
2 years ago
4
null cache
#768
weidezhang
opened
2 years ago
4
Reader: enable shuffling inside every row group
#767
chongxiaoc
closed
2 years ago
2
upgrade readthedocs to use Py3.7
#766
chongxiaoc
closed
2 years ago
1
make_batch_reader loses dtype with list-of-strings columns, causing Tensorflow error when lists contain a None value
#765
arhan-gunel
opened
2 years ago
0
Will petastorm Dataloader support prefetch like PyTorch Multiprocessing Dataloader?
#764
MARD1NO
closed
2 years ago
1
PyTorch Batched Non-shuffle Buffer Large Memory Consumption
#763
chongxiaoc
closed
2 years ago
1
PyTorch: improve memory-efficiency in batched non-shuffle buffer
#762
chongxiaoc
closed
2 years ago
3
dynamic padding via `collate_fn`
#761
Jomonsugi
opened
2 years ago
11
Newer pyarrow versions?
#760
winding-lines
closed
1 year ago
1
Can we input a custom collate function as an input variable when creating the dataloader ?
#759
shamanez
opened
2 years ago
0
Validate_schema keyword not supported yet
#758
kisel4363
opened
2 years ago
7
Replace process_iter by pid_exists
#757
MostafaFarahani
closed
2 years ago
3
Performance on large amounts of data
#756
jaycunningham-8451
opened
2 years ago
1
Next