issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
TransformSpec using Pandas causes incompatibilities with other libraries for make_batch_reader
#603
KamWithK
opened
4 years ago
15
Reproducing benchmark in issue #548
#602
selitvin
closed
3 years ago
0
Allow users to use s3, s3a and s3n protocols when saving / reading datasets
#601
selitvin
closed
4 years ago
11
Adding instructions on patching pyspark installation with s3 protocol supporting jars
#600
selitvin
closed
4 years ago
1
Feature/acmore/support custom filesystem
#599
acmore
closed
4 years ago
0
Use path with bucket name if it's an s3 path and a custom filesystem.
#598
acmore
closed
4 years ago
3
can we use s3 path here instead of hdfs?
#597
p9anand
opened
4 years ago
9
Add a flag to factory methods to allow zmq copy buffers to be disabled
#596
dmcguire81
closed
4 years ago
9
Expose the flag to disable Ømq copy buffers
#595
dmcguire81
closed
4 years ago
13
Parameterize factory methods with s3 configs
#594
dmcguire81
closed
4 years ago
4
The boto config used by s3fs is not parameterizeable when instantiated by Petastorm
#593
dmcguire81
closed
4 years ago
2
Bugfix: multithreaded metadata deadlock
#592
dmcguire81
closed
4 years ago
9
Schema inference does not apply filters to Metadata Discovery
#591
dmcguire81
closed
3 years ago
1
Deadlock in multithreaded Parquet metadata discovery
#590
dmcguire81
closed
4 years ago
5
Implement the __str__ method for codecs
#589
dmcguire81
closed
4 years ago
2
Ignore an invalid piece created for a subdirectory when a dataset is stored in an s3 bucket subdirectory
#588
selitvin
closed
4 years ago
1
petastorm.make_reader from s3 bucket path fails
#587
xb478
opened
4 years ago
5
Move gcsfs library to testing dependencies
#586
selitvin
closed
4 years ago
0
RuntimeWarning when using pure Python reader with process workers
#585
filipski
closed
2 years ago
6
Performance benchmarks - issues with tf.data.Dataset API reader and question about the pure Python one
#584
filipski
opened
4 years ago
4
Error running the generate_petastorm_dataset example
#583
ghost
closed
4 years ago
1
make_spark_converter returns Numpy in binary serialized format.
#582
apatsekin
opened
4 years ago
0
Adding py3.8 to the CI image
#581
selitvin
closed
3 years ago
0
Adding python 3.6 build to travis.ci config
#580
selitvin
closed
3 years ago
1
Release 0.9.4rc0
#579
selitvin
closed
4 years ago
1
Add Python 3.6 to travis CI docker image
#578
selitvin
closed
4 years ago
2
Change definition of UnischemaField to be PY3.6 compatible.
#577
selitvin
closed
4 years ago
1
v0.9.3 release
#576
abditag2
closed
4 years ago
1
0.9.3rc1
#575
abditag2
closed
4 years ago
0
Adding release procedure documentation
#574
selitvin
closed
4 years ago
1
Set unittest timeout to 360
#573
selitvin
closed
4 years ago
1
Adding missing legal header to gcsfs_wrapper.py
#572
selitvin
closed
4 years ago
1
Support for Azure Blob Storage and Azure Data Lake
#571
upendrarv
opened
4 years ago
5
Guidance on How to Tune BatchedDataLoader
#570
andrewredd
closed
4 years ago
6
Upgrade CI docker image to ci-2020-07-01-00
#569
selitvin
closed
4 years ago
1
Added additional kwargs for Spark Dataset Converter
#568
tgaddair
closed
4 years ago
10
Add imports to README example.
#567
rb-determined-ai
closed
4 years ago
3
bake mnist data into docker image
#566
abditag2
closed
4 years ago
2
Remove python 2.7 support from petastorm docker image
#565
selitvin
closed
4 years ago
0
exposed pyarrow filters in the make_reader and make_batch_reader api
#564
abditag2
closed
4 years ago
2
Use mypy in our CI script
#563
selitvin
closed
4 years ago
1
Retire support for Python 2.
#562
selitvin
closed
4 years ago
1
Fix GCSFS walk() method
#561
megaserg
opened
4 years ago
6
Some errors happen with the code rows_rdd = rows_rdd.map(lambda x:dict_to_spark_row(schema,x))
#560
cmh14
opened
4 years ago
6
NdarrayCodec does not implement __str__
#559
dmcguire81
closed
4 years ago
0
walk method in GCSFSWrapper returns empty string as one of filenames
#558
alekswithakayy
opened
4 years ago
2
Upgrade pyarrow to 0.17.1 in travis build
#557
selitvin
closed
3 years ago
1
Remove driver param for hdfs.connect when using pyarrow 0.17 and above
#556
tgaddair
closed
4 years ago
1
In-memory cache
#555
abditag2
closed
3 years ago
3
Added last_row_consumed property to WeightedSamplingReader
#554
selitvin
closed
4 years ago
1
Previous
Next