uber petastorm issues - Githubissues

uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.

Apache License 2.0

1.78k stars 285 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Spark Dataset Converter reset reader position does not work as expected

#553 liangz1 opened 4 years ago
2
Adding new SparkDatasetConverter documentation to the API section of autogenered docs.

#552 selitvin closed 4 years ago
2
Incorrect order of row groups when reading

#551 hig-dev closed 4 years ago
2
Importing ABC directly from collections was deprecated and will be removed in Python 3.10. Use collections.abc

#550 tirkarthi opened 4 years ago
0
Import petastorm.spark in init

#549 praateekmahajan closed 4 years ago
4
Deprecate python2

#548 WeichenXu123 closed 4 years ago
1
Non deterministic fail during model training

#547 sonNeturo closed 4 years ago
5
Parallelize encoding of a single row

#546 selitvin opened 4 years ago
2
[DO NOT MERGE] Test1

#545 WeichenXu123 closed 3 years ago
1
Fix DataLoader iter(dataloader) cannot be called more than once

#544 WeichenXu123 closed 4 years ago
1
[WIP] Fix bug: cannot call enumerate(dataloader) more than once

#543 liangz1 closed 4 years ago
1
Address several autograph failed issues for TF2

#542 WeichenXu123 closed 4 years ago
2
Fix Fix: The 'median size too small' warning is too frequent #538

#541 liangz1 closed 4 years ago
1
New PyTorch BatchedDataLoader implementation using batched operations

#540 fps7806 closed 4 years ago
3
Add TF 2.0 to CI to see what failures we have

#539 liangz1 closed 4 years ago
2
Fix: The "median size too small" warning is too frequent

#538 liangz1 closed 4 years ago
1
Fix bug: respect dynamically changed parent cache dir conf

#537 liangz1 closed 4 years ago
2
CI with Tensorflow 2.1

#536 WeichenXu123 closed 4 years ago
2
handle names like [path]/part/

#535 xiaohanhuang closed 4 years ago
7
[ML-10366] Fix bug: cleanup metadata in converter.delete()

#534 WeichenXu123 closed 4 years ago
1
Add ngrams support to make_petastorm_dataset function.

#533 selitvin closed 4 years ago
1
[WIP] Remove __init__.py from examples

#532 selitvin closed 4 years ago
1
Fix issue: spark session created on spark executor

#531 WeichenXu123 closed 4 years ago
1
Add spark dataset converter mnist example scripts

#530 liangz1 closed 4 years ago
2
Use parquet.summary.metadata.level to control _summary file creation.

#529 selitvin closed 4 years ago
1
Replace deprecated Schema.add_metadata with Schema.with_metadata call.

#528 selitvin closed 4 years ago
1
IndexError: list index out of range

#527 danielhaviv closed 4 years ago
1
Add a helper method `edit_field(name, numpy_dtype, shape, nullable=False)`

#526 WeichenXu123 closed 4 years ago
1
[ML-10160] Refine Spark dataset converter warning logs.

#525 WeichenXu123 closed 4 years ago
1
global context not imported in transform_spec function with reader_pool_type="process"

#524 kaiwenw opened 4 years ago
5
Error while using pytorch dataloader with petastorm

#523 nikitamehrotra12 closed 4 years ago
6
Simplify data conversion from Spark: support vector type and precision cast

#522 liangz1 closed 4 years ago
1
Add a build in CI for pyspark 3.0

#521 liangz1 closed 4 years ago
3
Added is_petastorm_compatible to make_reader

#520 abditag2 closed 4 years ago
3
Petastorm PyTorch dataloader slower than JSON

#519 kaiwenw opened 4 years ago
3
Update Spark Converter API section in README.rst

#518 liangz1 closed 4 years ago
3
[ML-10156] Fix array type field inferred shape

#517 WeichenXu123 closed 4 years ago
4
[WIP] Consolidating test runs

#516 selitvin closed 4 years ago
0
_common_metadata file gets corrupted

#515 filipski closed 4 years ago
9
[ML-9743] Address S3 eventually consistency issue on S3-like filesystem

#514 WeichenXu123 closed 4 years ago
1
[ML-10118] Preserve spark dataframe schema order when create petastorm dataset/dataloader

#513 WeichenXu123 closed 4 years ago
1
[WIP] Auto infer schema (including fields shape) from the first row

#512 WeichenXu123 opened 4 years ago
2
[WIP][ML-10118] Keep petastorm dataset/dataloader schema fields order the same with spark dataframe

#511 WeichenXu123 closed 4 years ago
1
[WIP] Use pyarrow serialization with `make_reader` by default

#510 selitvin closed 2 years ago
0
Remove duplicate reader type tested in test_end_to_end.py

#509 selitvin closed 4 years ago
1
Petastorm sharding + Distributed PyTorch

#508 megaserg opened 4 years ago
14
ast Syntax error when parsing non-petastorm dataset

#507 working-estimate opened 4 years ago
4
Simplify data conversion from Spark to TensorFlow: support tensorflow dataset advance arguments

#506 WeichenXu123 closed 4 years ago
2
Simplify data conversion from Spark to PyTorch DataLoader

#505 liangz1 closed 4 years ago
5
Make `make_batch_reader` TransformSpec support output multi-dimensional array type.

#504 WeichenXu123 closed 4 years ago
3

Previous Next