streetslab / dimelo

python package for analysis of dimelo-seq & nanopore modified base data
MIT License
3 stars 5 forks source link

New unit tests #27

Closed thekugelmeister closed 2 years ago

thekugelmeister commented 2 years ago

PR Contents

Included tests

Possible improvements

Specific review requests

amaslan commented 2 years ago

Comments on dimelo_test.py

  1. Getting all .html & .bed files and asserting the number to ensure they were produced (lines 212, 223 or 248, 259) --> will this cause issues if another instance of a browser test is run in testing before this or are files generated temporarily just for single tests? if this would cause an issue, for a quick fix, should the output directory also have a subdirectory so that only files from this single function call are put in that directory for file number checking? What is meant by "It's difficult to get the output file name"? Is it because of the parsing the region for string representation for files that are named like f"{outDir}/{f}_{sampleName}_{Region(region).string}_A.bed"?
  2. @reetm09 - do you think a database hash check for qc_report would be good to add?
  3. Thanks for checking, all of those legacy commented out tests can be removed!
  4. These tests all pass on my computer locally. This is so great and I think you really hit all the important test cases. I think just testing required files are generated for now is great because parse_bam is doing the underlying heavy lifting and you have more meaningful tests for that one.

Comments on plot_enrichment_profile.py I see you changed file naming from C to CG. That is a good idea and is correct. I realized this should also be changed in plot_browser.py The relevant lines are 268-271. I think the only other place where I incorrectly referred to C instead of CG is in the output bed file naming. I'll make a note and can update these and output print statements and corresponding tests once your changes are merged.

Thanks so much for adding the figure closing and warning messages!

thekugelmeister commented 2 years ago

Getting all .html & .bed files and asserting the number to ensure they were produced (lines 212, 223 or 248, 259) --> will this cause issues if another instance of a browser test is run in testing before this or are files generated temporarily just for single tests?

The files are all generated temporarily for each test, in different temporary directories, so this should not be a problem for now. Eventually those tests should be improved, but I don't think it's worth it yet. If you would prefer that I address it now I'm happy to! I was just taking the path of least resistance.

Thanks for checking, all of those legacy commented out tests can be removed!

Done!

I think just testing required files are generated for now is great because parse_bam is doing the underlying heavy lifting and you have more meaningful tests for that one.

Sounds good to me!

amaslan commented 2 years ago

The files are all generated temporarily for each test, in different temporary directories, so this should not be a problem for now. Eventually those tests should be improved, but I don't think it's worth it yet. If you would prefer that I address it now I'm happy to! I was just taking the path of least resistance.

I think ok as is! Thanks Jeremy!

reetm09 commented 2 years ago

Also, just realized I'm getting the same FileNotFound Error on the "dimelo/test/data/mod_mappings_subset.bam" file for the qc tests, I'm not sure why

thekugelmeister commented 2 years ago

For qc.py, I agree with Annie I think a hash function to check the content would be good to have.

Easy to add!

Additionally, if there's a test to compare the hashes of two different files types. i.e a bam file with the guppy basecaller will have 4 plots but a bam file with a megaladon basecaller will have only 2 plots (it will exclude the basecall quality and alignment quality plots). I'm not sure if this is too necessary but I think it would be good to test if there's time just to make sure the two different types of reports are being generated!

This is a great point and also doable, but I'm not convinced it will be functional long term. My hope is that the database format stays consistent enough for a hash check to work for now. Any small changes made to the plots would result in needing to recalculate a hash for those plots, and I imagine that they might be more likely to change. I also haven't tested hashing of PDF / png / etc. output to see if it is actually consistent. Do you think it is a necessary add now, or are you okay with me putting in a TODO item and leaving it at that for now?

I'm not sure what's happening on my end, but for some reason when I run all the tests locally for parse bam in the "TestParseBam(DiMeLoTestCase)" Class, I'm getting that 2 passed and 2 failed (test_parse_bam_bedFile and test_parse_bam_region).

On the surface this sounds like a pathing issue? As can be seen here, the builds are passing when run automatically. The tests are runnable from the top-level dimelo directory, either using the command pytest or by directly running the test file with python as follows:

python dimelo/test/dimelo_test.py
reetm09 commented 2 years ago

This is a great point and also doable, but I'm not convinced it will be functional long term. My hope is that the database format stays consistent enough for a hash check to work for now. Any small changes made to the plots would result in needing to recalculate a hash for those plots, and I imagine that they might be more likely to change. I also haven't tested hashing of PDF / png / etc. output to see if it is actually consistent. Do you think it is a necessary add now, or are you okay with me putting in a TODO item and leaving it at that for now?

I agree, I think for now this can be a TODO item! But thanks for the hash, I think that would be a good idea since qc generates the "reads" table

On the surface this sounds like a pathing issue? As can be seen here, the builds are passing when run automatically. The tests are runnable from the top-level dimelo directory, either using the command pytest or by directly running the test file with python as follows:

python dimelo/test/dimelo_test.py

Yes you are right, running this on the command line worked great -- all the tests passed! I think adding a TODO to see how to make this work on the Pycharm editor might also be a good move. But overall the new tests look great!