uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.78k stars 285 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Remove implementation of __getattr__ for UniSchema

#655 v01dXYZ opened 3 years ago
1
question about data frame partition

#654 weidezhang closed 3 years ago
2
BatchedDataLoader with shuffling_queue_capacity=0 is very slow

#653 selitvin opened 3 years ago
0
Memory usage is bigger and bigger with epoch, How can I solve this problem?

#652 Byronnar opened 3 years ago
3
Training stuck in Garbage Collector after first epoch Tensorflow

#651 anisayari opened 3 years ago
2
Using threads taking longer than dummy

#650 Jomonsugi closed 3 years ago
5
Any method to encode a spark's SparseVector and pass it to petastorm

#649 kapilkd13 opened 3 years ago
1
Add pandas dataframe API for the reader

#648 arita37 opened 3 years ago
9
How can I refactor this NLP PyTorch code to work with pyspark+petastorm (custom collate needed...)?

#647 davidefiocco opened 3 years ago
2
Suggestion for converting a spark array column to a format allowed by make_reader?

#646 Jomonsugi closed 3 years ago
6
OSError: Passed non-file path when passing an s3 path to materialize_dateset

#645 Jomonsugi closed 3 years ago
1
Performance comparison of make_reader() & make_petastorm_dataset() vs make_spark_converter() & make_tf_dataset()

#644 lndkcg opened 3 years ago
8
Tests: fix the unit test of batched dataloader with in-memory cache

#643 chongxiaoc closed 3 years ago
4
Failure in "test_pytorch_dataloader.py::test_batched_data_loader_with_in_memory_cache"

#642 chongxiaoc closed 3 years ago
2
Pytorch: add AsyncBatchedDataloader

#641 chongxiaoc closed 2 years ago
7
Security fix for arbitrary code execution.

#640 selitvin closed 3 years ago
1
Add comments in development.rst

#639 chongxiaoc closed 3 years ago
2
Implementing Asynchronous Data Shuffling Part

#638 chongxiaoc opened 3 years ago
1
Security Fix for Arbitrary Code Execution - huntr.dev

#637 huntr-helper closed 2 years ago
4
Adding a link to qcon.ai presentation to README.

#636 selitvin closed 3 years ago
1
Adding video to README

#635 crizCraig closed 3 years ago
3
Raise an explicit error when TransformSpec is given a shape with a variable dimension

#634 selitvin closed 3 years ago
1
Unischema variable shape error

#633 berendjansen opened 3 years ago
3
Add unit tests for compress in random shuffling buffer

#630 chongxiaoc closed 3 years ago
13
Is this the right way to partition dataset for Pytorch DDP?

#629 meprem opened 3 years ago
4
Fixed docstring in NGram class: timestamp_field argument illustration…

#628 ritwikbera closed 3 years ago
2
bug with weighted_sampling_reader and pytorch

#627 gueguenster closed 3 years ago
1
Bug/weighted sampling reader

#626 gueguenster closed 3 years ago
6
Support on-premise s3-compatible storage.

#625 acmore closed 3 years ago
7
Help translating mnist/pytorch_example to tabular data!

#624 afogarty85 closed 3 years ago
10
Integration with Apache Hudi

#623 LuisMoralesAlonso closed 3 years ago
1
Don't exclude fields not supported by pyarrow 0.15 from test data.

#622 selitvin closed 3 years ago
1
Sort During URL Normalization

#621 voganrc closed 2 years ago
5
Pytorch DataLoader with array of structs

#620 ramondalmau opened 3 years ago
3
Use new pyarrow.fs filesystem objects

#619 selitvin opened 3 years ago
1
Deprecate compat library since we no longer support pre pyarrow 0.17

#618 selitvin closed 3 years ago
1
Deprecating `pyarrow_serialize` argument of `petastorm.make_reader`.

#617 selitvin closed 3 years ago
1
Added support for np.uint8

#616 tgaddair closed 3 years ago
2
Raising an error if an empty list of columns is selected by a user

#615 selitvin closed 3 years ago
1
Bump-up minimal supported pyarrow version to 0.17.1

#614 selitvin closed 3 years ago
1
Make use of the new pyarrow.dataset functionality instead of ParquetDataset

#613 jorisvandenbossche opened 3 years ago
22
Upcoming changes in pyarrow 2.0

#612 jorisvandenbossche closed 2 years ago
3
Bugfix: S3FSWrapper is deprecated at s3fs 0.5.0

#611 dmcguire81 closed 3 years ago
1
"Currently do not support resetting a reader while in the middle of iteration."

#610 tadas-subonis opened 4 years ago
12
S3FSWrapper is deprecated as of s3fs 0.5.0

#609 dmcguire81 closed 3 years ago
1
Can it work with RDDs instead of DataFrames?

#608 tadas-subonis opened 4 years ago
4
Petastorm requires hadoop on client?

#607 ychnh closed 4 years ago
2
Fix docker file

#606 selitvin closed 4 years ago
1
Reading lists of numpy arrays

#605 selitvin opened 4 years ago
1
Multithreaded metadata discovery in ParquetDataset may cause deadlock

#604 dmcguire81 opened 4 years ago
0

Previous Next