pyjanitor-devs / pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor
https://pyjanitor-devs.github.io/pyjanitor
MIT License
1.37k stars 170 forks source link

[TST] Test coverage of functions (and lines) needs to improve #333

Closed hectormz closed 5 years ago

hectormz commented 5 years ago

While updating function argument names, I came across a function without a test, and was curious what other functions didn't have tests. I ran the pytest coverage plugin and got the following results.

----------- coverage: platform win32, python 3.6.7-final-0 -----------
Name                                  Stmts   Miss  Cover   Missing
-------------------------------------------------------------------
janitor\__init__.py                       2      0   100%
janitor\biology.py                       13      2    85%   12-13
janitor\chemistry.py                     69     48    30%   15, 64, 106-124, 164-186, 229-275, 307-318
janitor\errors.py                         2      0   100%
janitor\finance.py                       41     12    71%   53, 76-89, 101, 196
janitor\functions.py                    520     66    87%   606-614, 1113, 1139, 1143, 1192, 1198, 1468, 1817, 2061-2066, 2218-2224, 2235-2263, 2267-2274, 2278-2279, 2463-2499, 2590, 2609, 2895
janitor\io.py                            16      0   100%
janitor\testing_utils\__init__.py         0      0   100%
janitor\testing_utils\date_data.py        1      0   100%
janitor\testing_utils\strategies.py      12      1    92%   11
janitor\utils.py                         24      0   100%
-------------------------------------------------------------------
TOTAL                                   700    129    82%

This tells us which lines are not covered, but more importantly I wanted to know which functions have no coverage at all. Does anyone know how to get that answer? For the most part, the large blocks of code are functions that have no coverage:

chemistry
    smiles2mol
    morgan_fingerprint
    molecular_descriptors
    maccs_keys_fingerprint

functions
    _clean_accounting_column
    _currency_column_to_numeric
    _replace_empty_string_with_none
    currency_column_to_numeric

Overall, we have a test for at least some part of most functions, but chemistry doesn't have any, and there are a few in the main functions that are lacking.

This issue serves to identify functions that folks can write tests for, and discuss coverage.

ericmjl commented 5 years ago

This is a very useful issue that you've raised, @HectorM14!

I'm looking into how to get codecov working properly here, so that we can ensure that there's high test coverage in the codebase.

ericmjl commented 5 years ago

@HectorM14 just wanted to let you know, I just enabled codecov for this repo!

hectormz commented 5 years ago

Cool, what are the practicalities @ericmjl? It generates a report on increase/decrease of coverage, and a maintainer decides, or will it fail the build if there's a decrease in coverage?

ericmjl commented 5 years ago

It's definitely "maintainer decides". Lowered test coverage is a good flag that something was not caught, if a new feature was added.

hectormz commented 5 years ago

Got it, it was a quick addition!

ericmjl commented 5 years ago

Yeah! As it turns out, the first time I set it up and failed, I just... didn't read the docs properly :smile:.

dendrondal commented 5 years ago

I looked into the chemistry tests, and it appears that the issue is the "skipif" decorator that runs the rdkit tests only during CI. I commented these out, and the coverage jumped up to 85%. The only coverage that is missing is now several Raise statements and verifying the tqdm progress bar on smiles2mol. The tests only took 0.68 seconds with rdkit operations run on test data, so it may be worth either removing the "skipif" decorator, or only triggering coverage testing on continuous integration. Let me know your thoughts on this.

ericmjl commented 5 years ago

Thanks for the input, @dendrondal! Here are my responses inline.

...so it may be worth either removing the "skipif" decorator, or only triggering coverage testing on continuous integration.

Perhaps a bit of background may help explain why the skipif decorator exists. It was added because some sprinters didn't want to install a full cheminformatics + bioinformatics suite of tools just to hack on the core of pyjanitor. As such, we added the skipif decorator to let the tests pass there. I think the more relevant thing would be to explain why the skipif decorator exists in the code itself.

The only coverage that is missing is now several Raise statements and verifying the tqdm progress bar on smiles2mol

Definitely, some tests that trigger the raise statements would be great. As for triggering tqdm progress bar, if it's too difficult to test, we can worry about it later. It's a nice UI/UX thing, but definitely not core.

dendrondal commented 5 years ago

That makes a lot of sense, actually, since a pretty small subset of users will need to use RDKit. It sounds like so long as RDKit is used as part of the CI pipeline, it won't affect your coverage.

I'm currently working on coverage improvement for chemistry, finance, and functions. I've been able to get finance's coverage up to 89% and chemistry to 86%, and everything missed appears to be either intermediate variables within functions, completely outside of functions, or tqdm-related. Still working on functions, given the massive size, but it's currently showing 93%.

ericmjl commented 5 years ago

Cool stuff. Thanks, @dendrondal! Looking forward to seeing the PR.