issues
search
uber
/
petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k
stars
285
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Spark Dataset Converter reset reader position does not work as expected
#553
liangz1
opened
4 years ago
2
Adding new SparkDatasetConverter documentation to the API section of autogenered docs.
#552
selitvin
closed
4 years ago
2
Incorrect order of row groups when reading
#551
hig-dev
closed
4 years ago
2
Importing ABC directly from collections was deprecated and will be removed in Python 3.10. Use collections.abc
#550
tirkarthi
opened
4 years ago
0
Import petastorm.spark in init
#549
praateekmahajan
closed
4 years ago
4
Deprecate python2
#548
WeichenXu123
closed
4 years ago
1
Non deterministic fail during model training
#547
sonNeturo
closed
4 years ago
5
Parallelize encoding of a single row
#546
selitvin
opened
4 years ago
2
[DO NOT MERGE] Test1
#545
WeichenXu123
closed
3 years ago
1
Fix DataLoader iter(dataloader) cannot be called more than once
#544
WeichenXu123
closed
4 years ago
1
[WIP] Fix bug: cannot call enumerate(dataloader) more than once
#543
liangz1
closed
4 years ago
1
Address several autograph failed issues for TF2
#542
WeichenXu123
closed
4 years ago
2
Fix Fix: The 'median size too small' warning is too frequent #538
#541
liangz1
closed
4 years ago
1
New PyTorch BatchedDataLoader implementation using batched operations
#540
fps7806
closed
4 years ago
3
Add TF 2.0 to CI to see what failures we have
#539
liangz1
closed
4 years ago
2
Fix: The "median size too small" warning is too frequent
#538
liangz1
closed
4 years ago
1
Fix bug: respect dynamically changed parent cache dir conf
#537
liangz1
closed
4 years ago
2
CI with Tensorflow 2.1
#536
WeichenXu123
closed
4 years ago
2
handle names like [path]/part/
#535
xiaohanhuang
closed
4 years ago
7
[ML-10366] Fix bug: cleanup metadata in converter.delete()
#534
WeichenXu123
closed
4 years ago
1
Add ngrams support to make_petastorm_dataset function.
#533
selitvin
closed
4 years ago
1
[WIP] Remove __init__.py from examples
#532
selitvin
closed
4 years ago
1
Fix issue: spark session created on spark executor
#531
WeichenXu123
closed
4 years ago
1
Add spark dataset converter mnist example scripts
#530
liangz1
closed
4 years ago
2
Use parquet.summary.metadata.level to control _summary file creation.
#529
selitvin
closed
4 years ago
1
Replace deprecated Schema.add_metadata with Schema.with_metadata call.
#528
selitvin
closed
4 years ago
1
IndexError: list index out of range
#527
danielhaviv
closed
4 years ago
1
Add a helper method `edit_field(name, numpy_dtype, shape, nullable=False)`
#526
WeichenXu123
closed
4 years ago
1
[ML-10160] Refine Spark dataset converter warning logs.
#525
WeichenXu123
closed
4 years ago
1
global context not imported in transform_spec function with reader_pool_type="process"
#524
kaiwenw
opened
4 years ago
5
Error while using pytorch dataloader with petastorm
#523
nikitamehrotra12
closed
4 years ago
6
Simplify data conversion from Spark: support vector type and precision cast
#522
liangz1
closed
4 years ago
1
Add a build in CI for pyspark 3.0
#521
liangz1
closed
4 years ago
3
Added is_petastorm_compatible to make_reader
#520
abditag2
closed
4 years ago
3
Petastorm PyTorch dataloader slower than JSON
#519
kaiwenw
opened
4 years ago
3
Update Spark Converter API section in README.rst
#518
liangz1
closed
4 years ago
3
[ML-10156] Fix array type field inferred shape
#517
WeichenXu123
closed
4 years ago
4
[WIP] Consolidating test runs
#516
selitvin
closed
4 years ago
0
_common_metadata file gets corrupted
#515
filipski
closed
4 years ago
9
[ML-9743] Address S3 eventually consistency issue on S3-like filesystem
#514
WeichenXu123
closed
4 years ago
1
[ML-10118] Preserve spark dataframe schema order when create petastorm dataset/dataloader
#513
WeichenXu123
closed
4 years ago
1
[WIP] Auto infer schema (including fields shape) from the first row
#512
WeichenXu123
opened
4 years ago
2
[WIP][ML-10118] Keep petastorm dataset/dataloader schema fields order the same with spark dataframe
#511
WeichenXu123
closed
4 years ago
1
[WIP] Use pyarrow serialization with `make_reader` by default
#510
selitvin
closed
2 years ago
0
Remove duplicate reader type tested in test_end_to_end.py
#509
selitvin
closed
4 years ago
1
Petastorm sharding + Distributed PyTorch
#508
megaserg
opened
4 years ago
14
ast Syntax error when parsing non-petastorm dataset
#507
working-estimate
opened
4 years ago
4
Simplify data conversion from Spark to TensorFlow: support tensorflow dataset advance arguments
#506
WeichenXu123
closed
4 years ago
2
Simplify data conversion from Spark to PyTorch DataLoader
#505
liangz1
closed
4 years ago
5
Make `make_batch_reader` TransformSpec support output multi-dimensional array type.
#504
WeichenXu123
closed
4 years ago
3
Previous
Next