uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.8k stars 284 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Fix compat warnings all pyarrow versions [DONOTLAND]

#453 selitvin closed 4 years ago
0
Upgrade petastorm CI image to 2019-11-27_13-25-17. Uses ubuntu 18.04.

#452 selitvin closed 4 years ago
0
issue with Unischema to spark schema conversion

#451 msaisumanth opened 4 years ago
7
Use ubuntu 18.04 as CI image

#450 selitvin closed 4 years ago
0
Remove some warnings regarding .data attribute being deprecated.

#449 selitvin closed 4 years ago
0
Fixing compatibility issues with versions 0.13 and 0.14. Adding full version compatibility matrix to CI.

#448 selitvin closed 4 years ago
0
Error reading parquet files made by AWS Athena

#447 RoelantStegmann closed 4 years ago
14
Regex specified in `make_reader`'s `schema_fields` argument now must match the entire field name.

#446 selitvin closed 4 years ago
3
Added tests for test_parquet_reader.py

#445 gregw18 closed 3 years ago
15
Added option for using PyTorch for throughput testing, issue 219

#444 gregw18 closed 3 years ago
3
Increasing overhead until initial data fetch using make_reader

#443 GregAru opened 5 years ago
6
make_schema_view regex error

#442 working-estimate opened 5 years ago
3
Python running out of RAM

#441 Mmiglio closed 5 years ago
4
tensorflow 2 support

#440 ingolfured closed 5 years ago
1
Adding np.string to str conversion in scalar codec's encode.

#439 selitvin closed 5 years ago
0
Support pyarrow 0.15 API

#438 selitvin closed 5 years ago
1
incompatible with pyarrow==0.15.0

#437 abditag2 closed 5 years ago
2
Fixups that simplify working with transforms and WeightedSamplingReader

#436 selitvin closed 5 years ago
0
Enforce schema equivalence based on field names.

#435 selitvin closed 5 years ago
0
Fixing WeightedSamplingReader that got stale

#434 selitvin closed 5 years ago
0
Raise an error if a user tries to read from a reader after it was stopped

#433 selitvin closed 5 years ago
0
TF Dataset Runs Out Of Data Despite num_epochs=None

#432 cupdike closed 5 years ago
7
parameter is diff from doc description

#431 GitHub-HongweiZhang opened 5 years ago
0
Can ngram be used with external dataset along with make_batch_reader?

#430 priyankaexp opened 5 years ago
2
Adding a unit test checking that make_petastorm_dataset properly uses `num_epochs` argument.

#429 selitvin closed 5 years ago
0
allow NdarrayCodec.decode to pass through ndarrays

#428 stedn closed 5 years ago
3
Fetch a specific batch size

#427 priyankaexp closed 5 years ago
2
Error while creating a dataset (ArrowIOError: Invalid parquet file. Corrupt footer)

#426 sgvarsh closed 4 years ago
3
Support reading SparseVectors and Vectors

#425 abditag2 opened 5 years ago
3
Use UnischemaField object persisted in the dataset to guarantee correct decoding

#424 selitvin closed 5 years ago
0
Upgrade pyarrow 0.14.1 on travis.ci

#423 selitvin closed 5 years ago
0
Port test_unischema to pytest

#422 selitvin closed 5 years ago
0
Make 'codec' and 'nullable' optional UnischemaField arguments

#421 selitvin closed 5 years ago
0
Adding legacy datasets for testing: version 0.7.0 and 0.7.6

#420 selitvin closed 5 years ago
0
Support for GCS

#419 dexterfichuk closed 4 years ago
7
pyarrow.lib.ArrowIOError: Prior attempt to load libhdfs3 failed

#418 okedoki closed 5 years ago
6
Automatically delete columns when TransformSpec(..., removed_fields=.…

#417 selitvin closed 5 years ago
0
removed_fileds not acknowledged in arrow_reader_worker

#416 praateekmahajan closed 5 years ago
4
Throw the correct error in Transform.py

#415 praateekmahajan closed 5 years ago
0
Use deque in NoopShufflingBuffer.

#414 selitvin closed 5 years ago
1
Nd-array Shape in UnischemaField

#413 seranotannason closed 5 years ago
3
Fix pyspark hello world link

#412 mtn closed 5 years ago
1
Read proper batches when using petastorm.pytorch.DataLoader with make…

#411 selitvin closed 5 years ago
4
Segmentation fault with materialize_dataset and torch

#410 mattiasarro closed 5 years ago
2
Includeded `TransformSpec` documentation in the API documentation page.

#409 selitvin closed 5 years ago
0
Upgrade CI to test against TF 1.14 and pyarrow 0.14

#408 selitvin closed 5 years ago
0
Cast int to string in Runtime error (ArrowReaderWorker)

#407 praateekmahajan closed 5 years ago
2
Error while throwing runtime error in ArrowReaderWorker

#406 praateekmahajan closed 5 years ago
2
Loading a batch size i.e not equal to Row Group Size for PyTorch

#405 praateekmahajan closed 5 years ago
1
OSError: Unable to get namenodes for default service

#404 quiescentsam opened 5 years ago
8

Previous Next