hammerlab / cycledash

Variant Caller Analysis Dashboard and Data Management System
Other
35 stars 2 forks source link

Fix impossible stats & regression test #690

Closed ihodes closed 9 years ago

ihodes commented 9 years ago

fixes #588 fixes #516

Review on Reviewable

danvk commented 9 years ago

Thanks for fixing this and writing the tests!

This might be a good time to put down in writing exactly how the precision/recall calculations are done, particularly in the presence of multi-sample VCFs.


Review status: 0 of 3 files reviewed, 5 unresolved discussions, all commit checks successful.


cycledash/genotypes.py, line 456 [r1] (raw file): Does this mean we're requiring that the validation VCF have the same sample names as the run VCF?


tests/python/test_genotypes_stats_api.py, line 8 [r1] (raw file): get is pretty generic to import as a top-level symbol. How about just from cycledash import genotypes and then use genotypes.get to make the context a bit clearer?


tests/python/test_genotypes_stats_api.py, line 9 [r1] (raw file): pick is unused. Why isn't pylint picking this up?


tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file): Are there fields in stats that you don't want to check? This would read more clearly if you compared stats with a golden dict.


tests/python/test_genotypes_stats_api.py, line 46 [r1] (raw file): Is this really 8/12? It feels like less of a magic number if you write it out that way.


Comments from the review on Reviewable.io

ihodes commented 9 years ago

Way/a few minutes ahead of you :) https://github.com/hammerlab/cycledash/issues/691

Thanks for the review!


Review status: 0 of 3 files reviewed, 3 unresolved discussions, all commit checks successful.


cycledash/genotypes.py, line 456 [r1] (raw file): No—just that we're allowing the sample_name field on the validation/comparison VCF to be filtered as well. I filed https://github.com/hammerlab/cycledash/issues/689 because this can result in confusing behavior.


tests/python/test_genotypes_stats_api.py, line 9 [r1] (raw file): Removed & I don't know!


tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file): Unfortunately there's not a great way to compare against a dict with almost_equals on float in it. Could make it, but I'm not sure it buys us much; this is very clear, and the error message is clear when it's not equal.


tests/python/test_genotypes_stats_api.py, line 46 [r1] (raw file): Done.


Comments from the review on Reviewable.io

danvk commented 9 years ago

Looks good!


Review status: 0 of 4 files reviewed, 5 unresolved discussions, some commit checks pending.


tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file): That's fine. You may not need the _almost_equals bit—floating points shouldn't have uncertainty for simple divisions like these.


tests/python/test_genotypes_stats_api.py, line 41 [r2] (raw file): I don't think these names really make it any more clear what's going on—feel free to take them or leave them as you like.


Comments from the review on Reviewable.io

ihodes commented 9 years ago

Review status: 0 of 4 files reviewed, 5 unresolved discussions, some commit checks pending.


tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file): That's what I'd thought! It turns out they do, though.


Comments from the review on Reviewable.io