uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.8k stars 284 forks source link

assign transform result in _load_rows_with_predicate #490

Closed jgblight closed 4 years ago

jgblight commented 4 years ago

I noticed that TransformSpecs were not being applied if there was also a row predicate specified, this PR updates the _load_rows_with_predicate function to match https://github.com/uber/petastorm/blob/master/petastorm/py_dict_reader_worker.py#L184

claassistantio commented 4 years ago

CLA assistant check
All committers have signed the CLA.

codecov[bot] commented 4 years ago

Codecov Report

Merging #490 into master will not change coverage. The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #490   +/-   ##
=======================================
  Coverage   85.77%   85.77%           
=======================================
  Files          79       79           
  Lines        4190     4190           
  Branches      665      665           
=======================================
  Hits         3594     3594           
  Misses        494      494           
  Partials      102      102
Impacted Files Coverage Δ
petastorm/py_dict_reader_worker.py 95.23% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update b3120b1...0f58943. Read the comment docs.

selitvin commented 4 years ago

Thank you for the contribution - it looks good ! Please let me know if you would be able modify unit tests to make sure the code does not regress in the future. If not, I can help with that.

jgblight commented 4 years ago

@selitvin Does this test look ok? Thanks for reviewing!