New unit tests - Githubissues

thekugelmeister commented 2 years ago

PR Contents

Added basic unittests for all of the top-level methods.
Explicitly closing all figures
Raising meaningful errors for expected failures
Minor bug fixes

Included tests

Basic tests of functionality, generally defined by the existence of expected files
Tests for expected failures, i.e. invalid parameter combinations

Possible improvements

Right now, "success" is defined only by the complete running of the methods and generation of expected files. Only the parse_bam tests perform any sort of hash check to verify the contents of those outputs. Is it important to include more robust checks for the other methods? Investigating the feasibility of this would be fairly easy.

Specific review requests

Can you look through the tests provided in dimelo/test/dimelo_test.py for any methods you are familiar with to see if there are any important ones missing? In particular, all well-defined usecases for each method should be covered by a successful test and all known expected failures should be covered by a failure test.
@amaslan can you take a look at the commented-out legacy test class at the bottom of the file to see whether any of those tests are worth including?

amaslan commented 2 years ago

Comments on dimelo_test.py

Getting all .html & .bed files and asserting the number to ensure they were produced (lines 212, 223 or 248, 259) --> will this cause issues if another instance of a browser test is run in testing before this or are files generated temporarily just for single tests? if this would cause an issue, for a quick fix, should the output directory also have a subdirectory so that only files from this single function call are put in that directory for file number checking? What is meant by "It's difficult to get the output file name"? Is it because of the parsing the region for string representation for files that are named like f"{outDir}/{f}_{sampleName}_{Region(region).string}_A.bed"?
@reetm09 - do you think a database hash check for qc_report would be good to add?
Thanks for checking, all of those legacy commented out tests can be removed!
These tests all pass on my computer locally. This is so great and I think you really hit all the important test cases. I think just testing required files are generated for now is great because parse_bam is doing the underlying heavy lifting and you have more meaningful tests for that one.

Comments on plot_enrichment_profile.py I see you changed file naming from C to CG. That is a good idea and is correct. I realized this should also be changed in plot_browser.py The relevant lines are 268-271. I think the only other place where I incorrectly referred to C instead of CG is in the output bed file naming. I'll make a note and can update these and output print statements and corresponding tests once your changes are merged.

Thanks so much for adding the figure closing and warning messages!

thekugelmeister commented 2 years ago

Getting all .html & .bed files and asserting the number to ensure they were produced (lines 212, 223 or 248, 259) --> will this cause issues if another instance of a browser test is run in testing before this or are files generated temporarily just for single tests?

The files are all generated temporarily for each test, in different temporary directories, so this should not be a problem for now. Eventually those tests should be improved, but I don't think it's worth it yet. If you would prefer that I address it now I'm happy to! I was just taking the path of least resistance.

Thanks for checking, all of those legacy commented out tests can be removed!

Done!

I think just testing required files are generated for now is great because parse_bam is doing the underlying heavy lifting and you have more meaningful tests for that one.

Sounds good to me!

amaslan commented 2 years ago

The files are all generated temporarily for each test, in different temporary directories, so this should not be a problem for now. Eventually those tests should be improved, but I don't think it's worth it yet. If you would prefer that I address it now I'm happy to! I was just taking the path of least resistance.

I think ok as is! Thanks Jeremy!

reetm09 commented 2 years ago

Also, just realized I'm getting the same FileNotFound Error on the "dimelo/test/data/mod_mappings_subset.bam" file for the qc tests, I'm not sure why

thekugelmeister commented 2 years ago

For qc.py, I agree with Annie I think a hash function to check the content would be good to have.

Easy to add!

Additionally, if there's a test to compare the hashes of two different files types. i.e a bam file with the guppy basecaller will have 4 plots but a bam file with a megaladon basecaller will have only 2 plots (it will exclude the basecall quality and alignment quality plots). I'm not sure if this is too necessary but I think it would be good to test if there's time just to make sure the two different types of reports are being generated!

This is a great point and also doable, but I'm not convinced it will be functional long term. My hope is that the database format stays consistent enough for a hash check to work for now. Any small changes made to the plots would result in needing to recalculate a hash for those plots, and I imagine that they might be more likely to change. I also haven't tested hashing of PDF / png / etc. output to see if it is actually consistent. Do you think it is a necessary add now, or are you okay with me putting in a TODO item and leaving it at that for now?

I'm not sure what's happening on my end, but for some reason when I run all the tests locally for parse bam in the "TestParseBam(DiMeLoTestCase)" Class, I'm getting that 2 passed and 2 failed (test_parse_bam_bedFile and test_parse_bam_region).

On the surface this sounds like a pathing issue? As can be seen here, the builds are passing when run automatically. The tests are runnable from the top-level dimelo directory, either using the command pytest or by directly running the test file with python as follows:

python dimelo/test/dimelo_test.py

reetm09 commented 2 years ago

This is a great point and also doable, but I'm not convinced it will be functional long term. My hope is that the database format stays consistent enough for a hash check to work for now. Any small changes made to the plots would result in needing to recalculate a hash for those plots, and I imagine that they might be more likely to change. I also haven't tested hashing of PDF / png / etc. output to see if it is actually consistent. Do you think it is a necessary add now, or are you okay with me putting in a TODO item and leaving it at that for now?

I agree, I think for now this can be a TODO item! But thanks for the hash, I think that would be a good idea since qc generates the "reads" table

On the surface this sounds like a pathing issue? As can be seen here, the builds are passing when run automatically. The tests are runnable from the top-level dimelo directory, either using the command pytest or by directly running the test file with python as follows:

python dimelo/test/dimelo_test.py

Yes you are right, running this on the command line worked great -- all the tests passed! I think adding a TODO to see how to make this work on the Pycharm editor might also be a good move. But overall the new tests look great!

streetslab / dimelo

New unit tests #27

PR Contents

Included tests

Possible improvements

Specific review requests