issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Remove implementation of __getattr__ for UniSchema
#655
v01dXYZ
opened
3 years ago
1
question about data frame partition
#654
weidezhang
closed
3 years ago
2
BatchedDataLoader with shuffling_queue_capacity=0 is very slow
#653
selitvin
opened
3 years ago
0
Memory usage is bigger and bigger with epoch, How can I solve this problem?
#652
Byronnar
opened
3 years ago
3
Training stuck in Garbage Collector after first epoch Tensorflow
#651
anisayari
opened
3 years ago
2
Using threads taking longer than dummy
#650
Jomonsugi
closed
3 years ago
5
Any method to encode a spark's SparseVector and pass it to petastorm
#649
kapilkd13
opened
3 years ago
1
Add pandas dataframe API for the reader
#648
arita37
opened
3 years ago
9
How can I refactor this NLP PyTorch code to work with pyspark+petastorm (custom collate needed...)?
#647
davidefiocco
opened
3 years ago
2
Suggestion for converting a spark array column to a format allowed by make_reader?
#646
Jomonsugi
closed
3 years ago
6
OSError: Passed non-file path when passing an s3 path to materialize_dateset
#645
Jomonsugi
closed
3 years ago
1
Performance comparison of make_reader() & make_petastorm_dataset() vs make_spark_converter() & make_tf_dataset()
#644
lndkcg
opened
3 years ago
8
Tests: fix the unit test of batched dataloader with in-memory cache
#643
chongxiaoc
closed
3 years ago
4
Failure in "test_pytorch_dataloader.py::test_batched_data_loader_with_in_memory_cache"
#642
chongxiaoc
closed
3 years ago
2
Pytorch: add AsyncBatchedDataloader
#641
chongxiaoc
closed
2 years ago
7
Security fix for arbitrary code execution.
#640
selitvin
closed
3 years ago
1
Add comments in development.rst
#639
chongxiaoc
closed
3 years ago
2
Implementing Asynchronous Data Shuffling Part
#638
chongxiaoc
opened
3 years ago
1
Security Fix for Arbitrary Code Execution - huntr.dev
#637
huntr-helper
closed
2 years ago
4
Adding a link to qcon.ai presentation to README.
#636
selitvin
closed
3 years ago
1
Adding video to README
#635
crizCraig
closed
3 years ago
3
Raise an explicit error when TransformSpec is given a shape with a variable dimension
#634
selitvin
closed
3 years ago
1
Unischema variable shape error
#633
berendjansen
opened
3 years ago
3
Add unit tests for compress in random shuffling buffer
#630
chongxiaoc
closed
3 years ago
13
Is this the right way to partition dataset for Pytorch DDP?
#629
meprem
opened
3 years ago
4
Fixed docstring in NGram class: timestamp_field argument illustration…
#628
ritwikbera
closed
3 years ago
2
bug with weighted_sampling_reader and pytorch
#627
gueguenster
closed
3 years ago
1
Bug/weighted sampling reader
#626
gueguenster
closed
3 years ago
6
Support on-premise s3-compatible storage.
#625
acmore
closed
3 years ago
7
Help translating mnist/pytorch_example to tabular data!
#624
afogarty85
closed
3 years ago
10
Integration with Apache Hudi
#623
LuisMoralesAlonso
closed
3 years ago
1
Don't exclude fields not supported by pyarrow 0.15 from test data.
#622
selitvin
closed
3 years ago
1
Sort During URL Normalization
#621
voganrc
closed
2 years ago
5
Pytorch DataLoader with array of structs
#620
ramondalmau
opened
3 years ago
3
Use new pyarrow.fs filesystem objects
#619
selitvin
opened
3 years ago
1
Deprecate compat library since we no longer support pre pyarrow 0.17
#618
selitvin
closed
3 years ago
1
Deprecating `pyarrow_serialize` argument of `petastorm.make_reader`.
#617
selitvin
closed
3 years ago
1
Added support for np.uint8
#616
tgaddair
closed
3 years ago
2
Raising an error if an empty list of columns is selected by a user
#615
selitvin
closed
3 years ago
1
Bump-up minimal supported pyarrow version to 0.17.1
#614
selitvin
closed
3 years ago
1
Make use of the new pyarrow.dataset functionality instead of ParquetDataset
#613
jorisvandenbossche
opened
3 years ago
22
Upcoming changes in pyarrow 2.0
#612
jorisvandenbossche
closed
2 years ago
3
Bugfix: S3FSWrapper is deprecated at s3fs 0.5.0
#611
dmcguire81
closed
3 years ago
1
"Currently do not support resetting a reader while in the middle of iteration."
#610
tadas-subonis
opened
4 years ago
12
S3FSWrapper is deprecated as of s3fs 0.5.0
#609
dmcguire81
closed
3 years ago
1
Can it work with RDDs instead of DataFrames?
#608
tadas-subonis
opened
4 years ago
4
Petastorm requires hadoop on client?
#607
ychnh
closed
4 years ago
2
Fix docker file
#606
selitvin
closed
4 years ago
1
Reading lists of numpy arrays
#605
selitvin
opened
4 years ago
1
Multithreaded metadata discovery in ParquetDataset may cause deadlock
#604
dmcguire81
opened
4 years ago
0
Previous
Next