tgaddair commented 4 years ago

Adds support for:

Shuffling
Custom data loaders

tgaddair commented 4 years ago

@selitvin any idea what's causing these test failures? They seem to be unrelated.

selitvin commented 4 years ago

The failure is in the test_tf_make_reader_fn: ValueError: directory url must be a string - are you sure it's unrelated to your change.

selitvin commented 4 years ago

Adding @mengxr for viz.

tgaddair commented 4 years ago

@WeichenXu123, can you explain why support for make_reader was removed in https://github.com/uber/petastorm/commit/6c49ed5e6f6a193faf0701db46e886b9a5be5d5c#diff-b0f255caa777716ecea31adea665c124 ?

tgaddair commented 4 years ago

@mengxr can you comment on the reason for 6c49ed5? We want to be able to use make_reader with Spark Dataset Converter.

mengxr commented 4 years ago

https://github.com/uber/petastorm/pull/503#issue-386154352

We removed make_reader because we want to support an input URL list. @WeichenXu123 's PR only added that support to make_batch_reader.

codecov[bot] commented 4 years ago

Codecov Report

Merging #568 into master will increase coverage by 0.52%. The diff coverage is 77.77%.

@@            Coverage Diff             @@
##           master     #568      +/-   ##
==========================================
+ Coverage   85.15%   85.67%   +0.52%     
==========================================
  Files          87       87              
  Lines        4970     4976       +6     
  Branches      793      794       +1     
==========================================
+ Hits         4232     4263      +31     
+ Misses        591      577      -14     
+ Partials      147      136      -11

Impacted Files	Coverage Δ
petastorm/spark/spark_dataset_converter.py	`91.24% <77.77%> (+0.19%)`	:arrow_up:
petastorm/py_dict_reader_worker.py	`95.23% <0.00%> (+0.79%)`	:arrow_up:
petastorm/arrow_reader_worker.py	`92.00% <0.00%> (+2.00%)`	:arrow_up:
petastorm/tf_utils.py	`88.65% <0.00%> (+3.54%)`	:arrow_up:
petastorm/compat.py	`100.00% <0.00%> (+39.02%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e5d35e7...1e053e4. Read the comment docs.

tgaddair commented 4 years ago

@selitvin removed the make_reader support for now. I think tests should be passing now, though there seems to be a failure due a timeout. Is there an easy way to rerun the tests?

nightflight-dk commented 4 years ago

greetings from MSFT, early perf tests of this branch show promising results, we look forward to seeing this in master

tgaddair commented 4 years ago

Thanks for trying it out, @nightflight-dk. This has now been merged into master.

uber / petastorm

Added additional kwargs for Spark Dataset Converter #568

Codecov Report