issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k
stars
284
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Make petastorm reader support dataset url list.
#503
WeichenXu123
closed
4 years ago
4
Crash (segmentation fault) when storing data to HDFS
#502
filipski
closed
4 years ago
3
Cannot resolve error "Cannot auto-create unischema due to unsupported column type"
#501
muntasirraihan
opened
4 years ago
18
Suspected memory leak with `reader_pool_type='process'`
#500
megaserg
opened
4 years ago
1
parquet.enable.summary-metadata and add_metadata are deprecated
#499
filipski
opened
4 years ago
2
Verify that scalars and not arrays are passed to a ScalarCodec instance
#498
selitvin
closed
4 years ago
1
Best way to load folder of images into petastorm data set?
#497
filipski
closed
4 years ago
10
Simplify data conversion from Spark to TensorFlow: Spark converter basic implementation.
#496
WeichenXu123
closed
4 years ago
5
Fix spurious legacy-regex rule warning
#495
selitvin
closed
4 years ago
1
Segmentation fault importing pytorch Dataloader
#494
sonNeturo
closed
4 years ago
2
Petastorm slow performance when parsing a large column dataset
#493
jerrygb
opened
4 years ago
4
New PyTorch DataLoader implementation using batched operations
#492
fps7806
closed
4 years ago
1
Allow predicates to work with more than one partition
#491
gregw18
closed
3 years ago
1
assign transform result in _load_rows_with_predicate
#490
jgblight
closed
4 years ago
4
Implement detailed logging for hdfs nameservice/namenode resolution
#489
selitvin
opened
4 years ago
0
Add make_reader support for parquet partitioned on more than one key
#488
jamesprinc3
opened
4 years ago
2
Petastorm "float division by zero" when applying filter predicate on a dataset which is partitioned on more than one column
#487
jamesprinc3
opened
4 years ago
3
New PR adding tests for requesting invalid columns from parquet reader
#486
gregw18
closed
4 years ago
1
Fixing sources of various deprecation warnings
#485
selitvin
closed
3 years ago
2
Resurrect codecov integration
#484
selitvin
closed
4 years ago
1
ImportError: cannot import objects
#483
Vaibhav47Sharma
opened
4 years ago
1
A monitor thread added to workers
#482
ingolfured
closed
4 years ago
0
Fixing pandas 1.0 compatibility issue (as_matrix)
#481
selitvin
closed
4 years ago
0
Update to Python 3.7 in CI Dockerfile
#480
selitvin
closed
4 years ago
0
Pandas 1.0.0 compatibity
#479
abditag2
closed
4 years ago
0
Fixing MemeoryError when building docker image on docker.io
#478
selitvin
closed
4 years ago
0
Fix incorrect counting of number of row-groups per piece.
#477
selitvin
closed
4 years ago
1
Minor fixup to materialize_dataset docstring
#476
selitvin
closed
4 years ago
0
Problem with HelloWorld Example on Front Page of Repo
#475
andrewredd
closed
4 years ago
13
Sending FINISHED message to workers when main process dies
#474
ingolfured
closed
4 years ago
4
Verify that list-of-lists can be supported by Petastorm
#473
selitvin
opened
4 years ago
7
Support GCSFS
#472
megaserg
closed
4 years ago
4
Adding image_codec accessor to CompressedImageCodec
#471
selitvin
closed
4 years ago
0
Test for Travis log issue
#470
gregw18
closed
3 years ago
0
Fix failure due to partition fields values returned by pyarrow read.
#469
selitvin
closed
4 years ago
0
Is there support for gs datasets?
#468
skeller88
opened
4 years ago
2
dataset.make_one_shot_iterator() raises AttributeError: 'MapDataset' object has no attribute 'make_one_shot_iterator'
#467
skeller88
opened
4 years ago
1
Flaky test test_with_batch_reader[2] ?
#466
selitvin
opened
4 years ago
7
Make data transform logic a bit easier for non-TensorFlow users.
#465
gregw18
closed
3 years ago
6
What is the difference between petastorm and horovod?
#464
dclong
closed
4 years ago
1
Is there an easy way to transform native Parquet file to Petastorm datasets
#463
LiuxyEric
closed
4 years ago
2
ArrowIOError: Corrupted file, smaller than file footer
#462
balajib5497
opened
4 years ago
1
field added by transformation is omitted if NGram is defined
#461
GregAru
opened
4 years ago
0
Fix missing hostname attribute in ParseResult
#460
selitvin
closed
4 years ago
0
update materialize_dataset documentation for filesystem factory arg
#459
jakelarkn
closed
4 years ago
1
InvalidArgumentError: parquet array column not being transformed to dataset
#458
DASpringate
opened
4 years ago
5
Clarify that `results_queue_size` affects number of row-groups stored in the queue and not rows.
#457
selitvin
closed
4 years ago
0
results_queue_size defines max number of prefetched row groups?
#456
GregAru
closed
4 years ago
1
"Error in sys.excepthook" on exit when using make_reader with hdfs_driver libhdfs in python 3.x
#455
bhuntley
opened
4 years ago
0
Do not use zero-memory-copy feature of zmq to prevent large memory footprint swings.
#454
selitvin
closed
4 years ago
0
Previous
Next