uber / petastorm

Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Apache License 2.0
1.78k stars 285 forks source link

Add unit tests for compress in random shuffling buffer #630

Closed chongxiaoc closed 3 years ago

chongxiaoc commented 3 years ago

~Compress remaining shuffling buffer should use remained size, that is, self._size.~

self.size actually is a property decorator defined afterwards. I just change it to keep consistent with other places of code to improve readability.

Also, I added some unit tests to check compress results.

CLAassistant commented 3 years ago

CLA assistant check
All committers have signed the CLA.

chongxiaoc commented 3 years ago

@selitvin I cannot request reviewer on the right side panel, can you help me review or assign it? Thanks.

codecov[bot] commented 3 years ago

Codecov Report

Merging #630 (0ca35d2) into master (6c64a37) will not change coverage. The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #630   +/-   ##
=======================================
  Coverage   85.35%   85.35%           
=======================================
  Files          85       85           
  Lines        4978     4978           
  Branches      790      790           
=======================================
  Hits         4249     4249           
  Misses        589      589           
  Partials      140      140           
Impacted Files Coverage Δ
petastorm/reader_impl/pytorch_shuffling_buffer.py 92.80% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6c64a37...0ca35d2. Read the comment docs.

selitvin commented 3 years ago

Thanks for the fix. Really surprised we didn't catch that with a unit-test. Can you see if a test case is missing? If not, please let me know I'll take a stab at it.

chongxiaoc commented 3 years ago

Thanks for the fix. Really surprised we didn't catch that with a unit-test. Can you see if a test case is missing? If not, please let me know I'll take a stab at it.

@selitvin It seems that in test_shuffling_buffer.py, only result length, can_add, can_retrieve are tested. There is no test to verify the output of add_many with remaining shuffle buffer should be valid. Maybe I missed some points, I'm new to petastorm. :)

selitvin commented 3 years ago

Appears to be a miss then! Would you be able to extend the test to cover this line with the typo or you'd need a hand with that?

chongxiaoc commented 3 years ago

Appears to be a miss then! Would you be able to extend the test to cover this line with the typo or you'd need a hand with that?

Sure. I will start with that and reach out to you if need help.

selitvin commented 3 years ago

Appreciate it! Would be glad to help, but I think it should be pretty straight forward. Don't know how did we miss that...

chongxiaoc commented 3 years ago

@selitvin Actually this is not a typo in original code. self.size is defined as a property decorator afterwards. But I just recommend to change it to be consistent with all other places in the code using self._size

Please feel free to drop the first commit.

I also added necessary unit tests to check compress results in second commit, please take a look and let me know anything to change.

chongxiaoc commented 3 years ago

@selitvin How to re-trigger the test? I don't quite understand why build can fail just for a single character change.

selitvin commented 3 years ago

Appears to be a linting issue:

petastorm/tests/test_shuffling_buffer.py:262: [C0325(superfluous-parens), ] Unnecessary parens after 'not' keyword

https://travis-ci.com/github/uber/petastorm/jobs/464041086

You should be able to repro locally:

pylint --rcfile=.pylintrc petastorm examples -f parseable -r n
chongxiaoc commented 3 years ago

Appears to be a linting issue:

petastorm/tests/test_shuffling_buffer.py:262: [C0325(superfluous-parens), ] Unnecessary parens after 'not' keyword

https://travis-ci.com/github/uber/petastorm/jobs/464041086

You should be able to repro locally:

pylint --rcfile=.pylintrc petastorm examples -f parseable -r n

Got it!

chongxiaoc commented 3 years ago

@selitvin Fixed.