issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k
stars
284
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Fix compat warnings all pyarrow versions [DONOTLAND]
#453
selitvin
closed
4 years ago
0
Upgrade petastorm CI image to 2019-11-27_13-25-17. Uses ubuntu 18.04.
#452
selitvin
closed
4 years ago
0
issue with Unischema to spark schema conversion
#451
msaisumanth
opened
4 years ago
7
Use ubuntu 18.04 as CI image
#450
selitvin
closed
4 years ago
0
Remove some warnings regarding .data attribute being deprecated.
#449
selitvin
closed
4 years ago
0
Fixing compatibility issues with versions 0.13 and 0.14. Adding full version compatibility matrix to CI.
#448
selitvin
closed
4 years ago
0
Error reading parquet files made by AWS Athena
#447
RoelantStegmann
closed
4 years ago
14
Regex specified in `make_reader`'s `schema_fields` argument now must match the entire field name.
#446
selitvin
closed
4 years ago
3
Added tests for test_parquet_reader.py
#445
gregw18
closed
3 years ago
15
Added option for using PyTorch for throughput testing, issue 219
#444
gregw18
closed
3 years ago
3
Increasing overhead until initial data fetch using make_reader
#443
GregAru
opened
5 years ago
6
make_schema_view regex error
#442
working-estimate
opened
5 years ago
3
Python running out of RAM
#441
Mmiglio
closed
5 years ago
4
tensorflow 2 support
#440
ingolfured
closed
5 years ago
1
Adding np.string to str conversion in scalar codec's encode.
#439
selitvin
closed
5 years ago
0
Support pyarrow 0.15 API
#438
selitvin
closed
5 years ago
1
incompatible with pyarrow==0.15.0
#437
abditag2
closed
5 years ago
2
Fixups that simplify working with transforms and WeightedSamplingReader
#436
selitvin
closed
5 years ago
0
Enforce schema equivalence based on field names.
#435
selitvin
closed
5 years ago
0
Fixing WeightedSamplingReader that got stale
#434
selitvin
closed
5 years ago
0
Raise an error if a user tries to read from a reader after it was stopped
#433
selitvin
closed
5 years ago
0
TF Dataset Runs Out Of Data Despite num_epochs=None
#432
cupdike
closed
5 years ago
7
parameter is diff from doc description
#431
GitHub-HongweiZhang
opened
5 years ago
0
Can ngram be used with external dataset along with make_batch_reader?
#430
priyankaexp
opened
5 years ago
2
Adding a unit test checking that make_petastorm_dataset properly uses `num_epochs` argument.
#429
selitvin
closed
5 years ago
0
allow NdarrayCodec.decode to pass through ndarrays
#428
stedn
closed
5 years ago
3
Fetch a specific batch size
#427
priyankaexp
closed
5 years ago
2
Error while creating a dataset (ArrowIOError: Invalid parquet file. Corrupt footer)
#426
sgvarsh
closed
4 years ago
3
Support reading SparseVectors and Vectors
#425
abditag2
opened
5 years ago
3
Use UnischemaField object persisted in the dataset to guarantee correct decoding
#424
selitvin
closed
5 years ago
0
Upgrade pyarrow 0.14.1 on travis.ci
#423
selitvin
closed
5 years ago
0
Port test_unischema to pytest
#422
selitvin
closed
5 years ago
0
Make 'codec' and 'nullable' optional UnischemaField arguments
#421
selitvin
closed
5 years ago
0
Adding legacy datasets for testing: version 0.7.0 and 0.7.6
#420
selitvin
closed
5 years ago
0
Support for GCS
#419
dexterfichuk
closed
4 years ago
7
pyarrow.lib.ArrowIOError: Prior attempt to load libhdfs3 failed
#418
okedoki
closed
5 years ago
6
Automatically delete columns when TransformSpec(..., removed_fields=.…
#417
selitvin
closed
5 years ago
0
removed_fileds not acknowledged in arrow_reader_worker
#416
praateekmahajan
closed
5 years ago
4
Throw the correct error in Transform.py
#415
praateekmahajan
closed
5 years ago
0
Use deque in NoopShufflingBuffer.
#414
selitvin
closed
5 years ago
1
Nd-array Shape in UnischemaField
#413
seranotannason
closed
5 years ago
3
Fix pyspark hello world link
#412
mtn
closed
5 years ago
1
Read proper batches when using petastorm.pytorch.DataLoader with make…
#411
selitvin
closed
5 years ago
4
Segmentation fault with materialize_dataset and torch
#410
mattiasarro
closed
5 years ago
2
Includeded `TransformSpec` documentation in the API documentation page.
#409
selitvin
closed
5 years ago
0
Upgrade CI to test against TF 1.14 and pyarrow 0.14
#408
selitvin
closed
5 years ago
0
Cast int to string in Runtime error (ArrowReaderWorker)
#407
praateekmahajan
closed
5 years ago
2
Error while throwing runtime error in ArrowReaderWorker
#406
praateekmahajan
closed
5 years ago
2
Loading a batch size i.e not equal to Row Group Size for PyTorch
#405
praateekmahajan
closed
5 years ago
1
OSError: Unable to get namenodes for default service
#404
quiescentsam
opened
5 years ago
8
Previous
Next