uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.8k stars 284 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Remove unused unischema instance in non-petastorm dataset hello world…

#403 selitvin closed 5 years ago
0
Values returned by a transform function are not validated against the schema

#402 selitvin opened 5 years ago
5
transform_spec does not work with predicate

#401 GregAru closed 5 years ago
2
Inferring Unischema from Spark DataFrame/Schema

#400 seranotannason opened 5 years ago
3
`make_batch_reader` returning different numbers of features, labels in the same read

#399 srowen closed 5 years ago
4
Reduce the size of a rowgroup for mnist example.

#398 selitvin closed 5 years ago
0
CompressedImageCodec Dimension Options

#397 seranotannason opened 5 years ago
6
iterator from make_reader hangs after 10 epochs even if num_epochs=None

#396 sdegryze closed 5 years ago
3
Configure Petastrom to point to HDFS and SPARK

#395 prakashmstpt closed 5 years ago
1
Apply transform function after predicate was evaluated (with make_reader)

#394 selitvin closed 5 years ago
0
Make sure in_pseudorandom_split works also with non-string fields.

#393 selitvin closed 5 years ago
0
Keras Model does not converge

#392 stavshem opened 5 years ago
5
Train-Test Dataset Split

#391 seranotannason closed 5 years ago
13
Decode with make_batch_reader

#390 ThrowMeForALoop opened 5 years ago
4
A more generalised example in docs

#389 praateekmahajan opened 5 years ago
2
0.7.5

#388 Ivan-Dimitrov closed 5 years ago
1
Regarding performance of make_petastorm_dataset

#387 panfengfeng opened 5 years ago
10
Adding the user parameter when pyarrow.hdfs.connect and using spark user when possible

#386 Ivan-Dimitrov closed 5 years ago
0
Trouble running the Tensorflow example

#385 marwan116 opened 5 years ago
2
Adding the user parameter when pyarrow.hdfs.connect is called.

#384 Ivan-Dimitrov closed 5 years ago
2
Support reading from a partitioned dataset. Interpret types of the partition-by scalars properly. Also, remove dependency on pyspark while reading using make_batch_reader.

#383 selitvin closed 5 years ago
0
Support inter-row-group shuffling queue when reading from pytorch

#382 selitvin closed 5 years ago
0
Removing ReaderV2 implementation.

#381 selitvin closed 5 years ago
0
Generating meta data for existing datasets

#380 w1nk closed 5 years ago
1
Regarding performance of different read dataset methods

#379 panfengfeng closed 5 years ago
1
Support to get sample by index

#378 un-knight closed 5 years ago
1
Sharding for distribtued

#377 un-knight closed 5 years ago
7
Reading Parquet files stored on s3 using petastorm generates connection warnings

#376 keurcien closed 5 years ago
2
dfsclient warning

#375 bluesummers1129 closed 5 years ago
1
Cover unischema's `create_schema_view` call when a field is both spec…

#374 selitvin closed 5 years ago
0
Cleanup at the end of travis build and deploy only for one build matrix configuration

#373 selitvin closed 5 years ago
0
Segmentation fault when using make_batch_reader

#372 janhaviag closed 5 years ago
2
[WIP] Issue #318: Add libhdfs / libhdfs3 to development container

#371 jsgoller1 closed 4 years ago
2
Guarantee filesystem_factory returned from FilesystemResolver is serializable

#370 selitvin closed 5 years ago
0
Multiple index selectors

#369 GregAru closed 5 years ago
1
Fix `LocalDiskArrowTableCache` that was crashing due to arrow bug

#368 selitvin closed 5 years ago
0
No need to specify `removed` argument in `TransformSpec`

#367 selitvin opened 5 years ago
7
LocalDiskArrowCache performance & validity

#366 panfengfeng opened 5 years ago
4
adding regex expressions for ngrams

#365 Ivan-Dimitrov closed 5 years ago
2
Support reading uint32 types into TF

#364 selitvin closed 5 years ago
0
[WIP] #317 - Allow passing of a username parameter to FileSystemResolver

#363 jsgoller1 closed 5 years ago
4
using predicate func in make_batch_reader

#362 panfengfeng opened 5 years ago
2
Adding missing `future` dependency to `setup.py`

#361 selitvin closed 5 years ago
0
Petastorm import causes performance degradation for dask distributed multiprocessing jobs

#360 mike-grayhat opened 5 years ago
4
the size of batch_reader result & tf.dataset batch_size

#359 panfengfeng opened 5 years ago
2
Performance benchmarks against HDF5

#358 georghildebrand opened 5 years ago
3
Allow native Parquet types to flow freely to the user.

#357 selitvin closed 4 years ago
0
arrow_reader_worker.py: to_pandas() of read_next() move to ArrowReaderWorker.process(*)

#356 panfengfeng closed 5 years ago
0
Make it simpler for PyTorch users to use TransformSpec

#355 selitvin opened 5 years ago
0
make_batch_reader with nullable field, TypeError: an integer is required

#354 working-estimate closed 5 years ago
7

Previous Next