Closed thekugelmeister closed 2 years ago
Comments on dimelo_test.py
f"{outDir}/{f}_{sampleName}_{Region(region).string}_A.bed"
? qc_report
would be good to add?parse_bam
is doing the underlying heavy lifting and you have more meaningful tests for that one.Comments on plot_enrichment_profile.py
I see you changed file naming from C to CG. That is a good idea and is correct. I realized this should also be changed in plot_browser.py
The relevant lines are 268-271. I think the only other place where I incorrectly referred to C instead of CG is in the output bed file naming. I'll make a note and can update these and output print statements and corresponding tests once your changes are merged.
Thanks so much for adding the figure closing and warning messages!
Getting all .html & .bed files and asserting the number to ensure they were produced (lines 212, 223 or 248, 259) --> will this cause issues if another instance of a browser test is run in testing before this or are files generated temporarily just for single tests?
The files are all generated temporarily for each test, in different temporary directories, so this should not be a problem for now. Eventually those tests should be improved, but I don't think it's worth it yet. If you would prefer that I address it now I'm happy to! I was just taking the path of least resistance.
Thanks for checking, all of those legacy commented out tests can be removed!
Done!
I think just testing required files are generated for now is great because parse_bam is doing the underlying heavy lifting and you have more meaningful tests for that one.
Sounds good to me!
The files are all generated temporarily for each test, in different temporary directories, so this should not be a problem for now. Eventually those tests should be improved, but I don't think it's worth it yet. If you would prefer that I address it now I'm happy to! I was just taking the path of least resistance.
I think ok as is! Thanks Jeremy!
Also, just realized I'm getting the same FileNotFound Error on the "dimelo/test/data/mod_mappings_subset.bam" file for the qc tests, I'm not sure why
For qc.py, I agree with Annie I think a hash function to check the content would be good to have.
Easy to add!
Additionally, if there's a test to compare the hashes of two different files types. i.e a bam file with the guppy basecaller will have 4 plots but a bam file with a megaladon basecaller will have only 2 plots (it will exclude the basecall quality and alignment quality plots). I'm not sure if this is too necessary but I think it would be good to test if there's time just to make sure the two different types of reports are being generated!
This is a great point and also doable, but I'm not convinced it will be functional long term. My hope is that the database format stays consistent enough for a hash check to work for now. Any small changes made to the plots would result in needing to recalculate a hash for those plots, and I imagine that they might be more likely to change. I also haven't tested hashing of PDF / png / etc. output to see if it is actually consistent. Do you think it is a necessary add now, or are you okay with me putting in a TODO item and leaving it at that for now?
I'm not sure what's happening on my end, but for some reason when I run all the tests locally for parse bam in the "TestParseBam(DiMeLoTestCase)" Class, I'm getting that 2 passed and 2 failed (test_parse_bam_bedFile and test_parse_bam_region).
On the surface this sounds like a pathing issue? As can be seen here, the builds are passing when run automatically. The tests are runnable from the top-level dimelo
directory, either using the command pytest
or by directly running the test file with python as follows:
python dimelo/test/dimelo_test.py
This is a great point and also doable, but I'm not convinced it will be functional long term. My hope is that the database format stays consistent enough for a hash check to work for now. Any small changes made to the plots would result in needing to recalculate a hash for those plots, and I imagine that they might be more likely to change. I also haven't tested hashing of PDF / png / etc. output to see if it is actually consistent. Do you think it is a necessary add now, or are you okay with me putting in a TODO item and leaving it at that for now?
I agree, I think for now this can be a TODO item! But thanks for the hash, I think that would be a good idea since qc generates the "reads" table
On the surface this sounds like a pathing issue? As can be seen here, the builds are passing when run automatically. The tests are runnable from the top-level dimelo directory, either using the command pytest or by directly running the test file with python as follows:
python dimelo/test/dimelo_test.py
Yes you are right, running this on the command line worked great -- all the tests passed! I think adding a TODO to see how to make this work on the Pycharm editor might also be a good move. But overall the new tests look great!
PR Contents
Included tests
Possible improvements
parse_bam
tests perform any sort of hash check to verify the contents of those outputs. Is it important to include more robust checks for the other methods? Investigating the feasibility of this would be fairly easy.Specific review requests
dimelo/test/dimelo_test.py
for any methods you are familiar with to see if there are any important ones missing? In particular, all well-defined usecases for each method should be covered by a successful test and all known expected failures should be covered by a failure test.