issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k
stars
284
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Remove unused unischema instance in non-petastorm dataset hello world…
#403
selitvin
closed
5 years ago
0
Values returned by a transform function are not validated against the schema
#402
selitvin
opened
5 years ago
5
transform_spec does not work with predicate
#401
GregAru
closed
5 years ago
2
Inferring Unischema from Spark DataFrame/Schema
#400
seranotannason
opened
5 years ago
3
`make_batch_reader` returning different numbers of features, labels in the same read
#399
srowen
closed
5 years ago
4
Reduce the size of a rowgroup for mnist example.
#398
selitvin
closed
5 years ago
0
CompressedImageCodec Dimension Options
#397
seranotannason
opened
5 years ago
6
iterator from make_reader hangs after 10 epochs even if num_epochs=None
#396
sdegryze
closed
5 years ago
3
Configure Petastrom to point to HDFS and SPARK
#395
prakashmstpt
closed
5 years ago
1
Apply transform function after predicate was evaluated (with make_reader)
#394
selitvin
closed
5 years ago
0
Make sure in_pseudorandom_split works also with non-string fields.
#393
selitvin
closed
5 years ago
0
Keras Model does not converge
#392
stavshem
opened
5 years ago
5
Train-Test Dataset Split
#391
seranotannason
closed
5 years ago
13
Decode with make_batch_reader
#390
ThrowMeForALoop
opened
5 years ago
4
A more generalised example in docs
#389
praateekmahajan
opened
5 years ago
2
0.7.5
#388
Ivan-Dimitrov
closed
5 years ago
1
Regarding performance of make_petastorm_dataset
#387
panfengfeng
opened
5 years ago
10
Adding the user parameter when pyarrow.hdfs.connect and using spark user when possible
#386
Ivan-Dimitrov
closed
5 years ago
0
Trouble running the Tensorflow example
#385
marwan116
opened
5 years ago
2
Adding the user parameter when pyarrow.hdfs.connect is called.
#384
Ivan-Dimitrov
closed
5 years ago
2
Support reading from a partitioned dataset. Interpret types of the partition-by scalars properly. Also, remove dependency on pyspark while reading using make_batch_reader.
#383
selitvin
closed
5 years ago
0
Support inter-row-group shuffling queue when reading from pytorch
#382
selitvin
closed
5 years ago
0
Removing ReaderV2 implementation.
#381
selitvin
closed
5 years ago
0
Generating meta data for existing datasets
#380
w1nk
closed
5 years ago
1
Regarding performance of different read dataset methods
#379
panfengfeng
closed
5 years ago
1
Support to get sample by index
#378
un-knight
closed
5 years ago
1
Sharding for distribtued
#377
un-knight
closed
5 years ago
7
Reading Parquet files stored on s3 using petastorm generates connection warnings
#376
keurcien
closed
5 years ago
2
dfsclient warning
#375
bluesummers1129
closed
5 years ago
1
Cover unischema's `create_schema_view` call when a field is both spec…
#374
selitvin
closed
5 years ago
0
Cleanup at the end of travis build and deploy only for one build matrix configuration
#373
selitvin
closed
5 years ago
0
Segmentation fault when using make_batch_reader
#372
janhaviag
closed
5 years ago
2
[WIP] Issue #318: Add libhdfs / libhdfs3 to development container
#371
jsgoller1
closed
4 years ago
2
Guarantee filesystem_factory returned from FilesystemResolver is serializable
#370
selitvin
closed
5 years ago
0
Multiple index selectors
#369
GregAru
closed
5 years ago
1
Fix `LocalDiskArrowTableCache` that was crashing due to arrow bug
#368
selitvin
closed
5 years ago
0
No need to specify `removed` argument in `TransformSpec`
#367
selitvin
opened
5 years ago
7
LocalDiskArrowCache performance & validity
#366
panfengfeng
opened
5 years ago
4
adding regex expressions for ngrams
#365
Ivan-Dimitrov
closed
5 years ago
2
Support reading uint32 types into TF
#364
selitvin
closed
5 years ago
0
[WIP] #317 - Allow passing of a username parameter to FileSystemResolver
#363
jsgoller1
closed
5 years ago
4
using predicate func in make_batch_reader
#362
panfengfeng
opened
5 years ago
2
Adding missing `future` dependency to `setup.py`
#361
selitvin
closed
5 years ago
0
Petastorm import causes performance degradation for dask distributed multiprocessing jobs
#360
mike-grayhat
opened
5 years ago
4
the size of batch_reader result & tf.dataset batch_size
#359
panfengfeng
opened
5 years ago
2
Performance benchmarks against HDF5
#358
georghildebrand
opened
5 years ago
3
Allow native Parquet types to flow freely to the user.
#357
selitvin
closed
4 years ago
0
arrow_reader_worker.py: to_pandas() of read_next() move to ArrowReaderWorker.process(*)
#356
panfengfeng
closed
5 years ago
0
Make it simpler for PyTorch users to use TransformSpec
#355
selitvin
opened
5 years ago
0
make_batch_reader with nullable field, TypeError: an integer is required
#354
working-estimate
closed
5 years ago
7
Previous
Next