uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.8k stars 284 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Make petastorm reader support dataset url list.

#503 WeichenXu123 closed 4 years ago
4
Crash (segmentation fault) when storing data to HDFS

#502 filipski closed 4 years ago
3
Cannot resolve error "Cannot auto-create unischema due to unsupported column type"

#501 muntasirraihan opened 4 years ago
18
Suspected memory leak with `reader_pool_type='process'`

#500 megaserg opened 4 years ago
1
parquet.enable.summary-metadata and add_metadata are deprecated

#499 filipski opened 4 years ago
2
Verify that scalars and not arrays are passed to a ScalarCodec instance

#498 selitvin closed 4 years ago
1
Best way to load folder of images into petastorm data set?

#497 filipski closed 4 years ago
10
Simplify data conversion from Spark to TensorFlow: Spark converter basic implementation.

#496 WeichenXu123 closed 4 years ago
5
Fix spurious legacy-regex rule warning

#495 selitvin closed 4 years ago
1
Segmentation fault importing pytorch Dataloader

#494 sonNeturo closed 4 years ago
2
Petastorm slow performance when parsing a large column dataset

#493 jerrygb opened 4 years ago
4
New PyTorch DataLoader implementation using batched operations

#492 fps7806 closed 4 years ago
1
Allow predicates to work with more than one partition

#491 gregw18 closed 3 years ago
1
assign transform result in _load_rows_with_predicate

#490 jgblight closed 4 years ago
4
Implement detailed logging for hdfs nameservice/namenode resolution

#489 selitvin opened 4 years ago
0
Add make_reader support for parquet partitioned on more than one key

#488 jamesprinc3 opened 4 years ago
2
Petastorm "float division by zero" when applying filter predicate on a dataset which is partitioned on more than one column

#487 jamesprinc3 opened 4 years ago
3
New PR adding tests for requesting invalid columns from parquet reader

#486 gregw18 closed 4 years ago
1
Fixing sources of various deprecation warnings

#485 selitvin closed 3 years ago
2
Resurrect codecov integration

#484 selitvin closed 4 years ago
1
ImportError: cannot import objects

#483 Vaibhav47Sharma opened 4 years ago
1
A monitor thread added to workers

#482 ingolfured closed 4 years ago
0
Fixing pandas 1.0 compatibility issue (as_matrix)

#481 selitvin closed 4 years ago
0
Update to Python 3.7 in CI Dockerfile

#480 selitvin closed 4 years ago
0
Pandas 1.0.0 compatibity

#479 abditag2 closed 4 years ago
0
Fixing MemeoryError when building docker image on docker.io

#478 selitvin closed 4 years ago
0
Fix incorrect counting of number of row-groups per piece.

#477 selitvin closed 4 years ago
1
Minor fixup to materialize_dataset docstring

#476 selitvin closed 4 years ago
0
Problem with HelloWorld Example on Front Page of Repo

#475 andrewredd closed 4 years ago
13
Sending FINISHED message to workers when main process dies

#474 ingolfured closed 4 years ago
4
Verify that list-of-lists can be supported by Petastorm

#473 selitvin opened 4 years ago
7
Support GCSFS

#472 megaserg closed 4 years ago
4
Adding image_codec accessor to CompressedImageCodec

#471 selitvin closed 4 years ago
0
Test for Travis log issue

#470 gregw18 closed 3 years ago
0
Fix failure due to partition fields values returned by pyarrow read.

#469 selitvin closed 4 years ago
0
Is there support for gs datasets?

#468 skeller88 opened 4 years ago
2
dataset.make_one_shot_iterator() raises AttributeError: 'MapDataset' object has no attribute 'make_one_shot_iterator'

#467 skeller88 opened 4 years ago
1
Flaky test test_with_batch_reader[2] ?

#466 selitvin opened 4 years ago
7
Make data transform logic a bit easier for non-TensorFlow users.

#465 gregw18 closed 3 years ago
6
What is the difference between petastorm and horovod?

#464 dclong closed 4 years ago
1
Is there an easy way to transform native Parquet file to Petastorm datasets

#463 LiuxyEric closed 4 years ago
2
ArrowIOError: Corrupted file, smaller than file footer

#462 balajib5497 opened 4 years ago
1
field added by transformation is omitted if NGram is defined

#461 GregAru opened 4 years ago
0
Fix missing hostname attribute in ParseResult

#460 selitvin closed 4 years ago
0
update materialize_dataset documentation for filesystem factory arg

#459 jakelarkn closed 4 years ago
1
InvalidArgumentError: parquet array column not being transformed to dataset

#458 DASpringate opened 4 years ago
5
Clarify that `results_queue_size` affects number of row-groups stored in the queue and not rows.

#457 selitvin closed 4 years ago
0
results_queue_size defines max number of prefetched row groups?

#456 GregAru closed 4 years ago
1
"Error in sys.excepthook" on exit when using make_reader with hdfs_driver libhdfs in python 3.x

#455 bhuntley opened 4 years ago
0
Do not use zero-memory-copy feature of zmq to prevent large memory footprint swings.

#454 selitvin closed 4 years ago
0

Previous Next