uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.78k stars 285 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

TransformSpec using Pandas causes incompatibilities with other libraries for make_batch_reader

#603 KamWithK opened 4 years ago
15
Reproducing benchmark in issue #548

#602 selitvin closed 3 years ago
0
Allow users to use s3, s3a and s3n protocols when saving / reading datasets

#601 selitvin closed 4 years ago
11
Adding instructions on patching pyspark installation with s3 protocol supporting jars

#600 selitvin closed 4 years ago
1
Feature/acmore/support custom filesystem

#599 acmore closed 4 years ago
0
Use path with bucket name if it's an s3 path and a custom filesystem.

#598 acmore closed 4 years ago
3
can we use s3 path here instead of hdfs?

#597 p9anand opened 4 years ago
9
Add a flag to factory methods to allow zmq copy buffers to be disabled

#596 dmcguire81 closed 4 years ago
9
Expose the flag to disable Ømq copy buffers

#595 dmcguire81 closed 4 years ago
13
Parameterize factory methods with s3 configs

#594 dmcguire81 closed 4 years ago
4
The boto config used by s3fs is not parameterizeable when instantiated by Petastorm

#593 dmcguire81 closed 4 years ago
2
Bugfix: multithreaded metadata deadlock

#592 dmcguire81 closed 4 years ago
9
Schema inference does not apply filters to Metadata Discovery

#591 dmcguire81 closed 3 years ago
1
Deadlock in multithreaded Parquet metadata discovery

#590 dmcguire81 closed 4 years ago
5
Implement the __str__ method for codecs

#589 dmcguire81 closed 4 years ago
2
Ignore an invalid piece created for a subdirectory when a dataset is stored in an s3 bucket subdirectory

#588 selitvin closed 4 years ago
1
petastorm.make_reader from s3 bucket path fails

#587 xb478 opened 4 years ago
5
Move gcsfs library to testing dependencies

#586 selitvin closed 4 years ago
0
RuntimeWarning when using pure Python reader with process workers

#585 filipski closed 2 years ago
6
Performance benchmarks - issues with tf.data.Dataset API reader and question about the pure Python one

#584 filipski opened 4 years ago
4
Error running the generate_petastorm_dataset example

#583 ghost closed 4 years ago
1
make_spark_converter returns Numpy in binary serialized format.

#582 apatsekin opened 4 years ago
0
Adding py3.8 to the CI image

#581 selitvin closed 3 years ago
0
Adding python 3.6 build to travis.ci config

#580 selitvin closed 3 years ago
1
Release 0.9.4rc0

#579 selitvin closed 4 years ago
1
Add Python 3.6 to travis CI docker image

#578 selitvin closed 4 years ago
2
Change definition of UnischemaField to be PY3.6 compatible.

#577 selitvin closed 4 years ago
1
v0.9.3 release

#576 abditag2 closed 4 years ago
1
0.9.3rc1

#575 abditag2 closed 4 years ago
0
Adding release procedure documentation

#574 selitvin closed 4 years ago
1
Set unittest timeout to 360

#573 selitvin closed 4 years ago
1
Adding missing legal header to gcsfs_wrapper.py

#572 selitvin closed 4 years ago
1
Support for Azure Blob Storage and Azure Data Lake

#571 upendrarv opened 4 years ago
5
Guidance on How to Tune BatchedDataLoader

#570 andrewredd closed 4 years ago
6
Upgrade CI docker image to ci-2020-07-01-00

#569 selitvin closed 4 years ago
1
Added additional kwargs for Spark Dataset Converter

#568 tgaddair closed 4 years ago
10
Add imports to README example.

#567 rb-determined-ai closed 4 years ago
3
bake mnist data into docker image

#566 abditag2 closed 4 years ago
2
Remove python 2.7 support from petastorm docker image

#565 selitvin closed 4 years ago
0
exposed pyarrow filters in the make_reader and make_batch_reader api

#564 abditag2 closed 4 years ago
2
Use mypy in our CI script

#563 selitvin closed 4 years ago
1
Retire support for Python 2.

#562 selitvin closed 4 years ago
1
Fix GCSFS walk() method

#561 megaserg opened 4 years ago
6
Some errors happen with the code rows_rdd = rows_rdd.map(lambda x:dict_to_spark_row(schema,x))

#560 cmh14 opened 4 years ago
6
NdarrayCodec does not implement __str__

#559 dmcguire81 closed 4 years ago
0
walk method in GCSFSWrapper returns empty string as one of filenames

#558 alekswithakayy opened 4 years ago
2
Upgrade pyarrow to 0.17.1 in travis build

#557 selitvin closed 3 years ago
1
Remove driver param for hdfs.connect when using pyarrow 0.17 and above

#556 tgaddair closed 4 years ago
1
In-memory cache

#555 abditag2 closed 3 years ago
3
Added last_row_consumed property to WeightedSamplingReader

#554 selitvin closed 4 years ago
1

Previous Next