Fix impossible stats & regression test

ihodes commented 9 years ago

fixes #588 fixes #516

danvk commented 9 years ago

Thanks for fixing this and writing the tests!

This might be a good time to put down in writing exactly how the precision/recall calculations are done, particularly in the presence of multi-sample VCFs.

Review status: 0 of 3 files reviewed, 5 unresolved discussions, all commit checks successful.

^{cycledash/genotypes.py, line 456 [r1] (raw file):} Does this mean we're requiring that the validation VCF have the same sample names as the run VCF?

^{tests/python/test_genotypes_stats_api.py, line 8 [r1] (raw file):} get is pretty generic to import as a top-level symbol. How about just from cycledash import genotypes and then use genotypes.get to make the context a bit clearer?

^{tests/python/test_genotypes_stats_api.py, line 9 [r1] (raw file):} pick is unused. Why isn't pylint picking this up?

^{tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file):} Are there fields in stats that you don't want to check? This would read more clearly if you compared stats with a golden dict.

^{tests/python/test_genotypes_stats_api.py, line 46 [r1] (raw file):} Is this really 8/12? It feels like less of a magic number if you write it out that way.

Comments from the review on Reviewable.io

ihodes commented 9 years ago

Way/a few minutes ahead of you :) https://github.com/hammerlab/cycledash/issues/691

Thanks for the review!

Review status: 0 of 3 files reviewed, 3 unresolved discussions, all commit checks successful.

^{cycledash/genotypes.py, line 456 [r1] (raw file):} No—just that we're allowing the sample_name field on the validation/comparison VCF to be filtered as well. I filed https://github.com/hammerlab/cycledash/issues/689 because this can result in confusing behavior.

^{tests/python/test_genotypes_stats_api.py, line 9 [r1] (raw file):} Removed & I don't know!

^{tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file):} Unfortunately there's not a great way to compare against a dict with almost_equals on float in it. Could make it, but I'm not sure it buys us much; this is very clear, and the error message is clear when it's not equal.

^{tests/python/test_genotypes_stats_api.py, line 46 [r1] (raw file):} Done.

Comments from the review on Reviewable.io

danvk commented 9 years ago

Looks good!

Review status: 0 of 4 files reviewed, 5 unresolved discussions, some commit checks pending.

^{tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file):} That's fine. You may not need the _almost_equals bit—floating points shouldn't have uncertainty for simple divisions like these.

^{tests/python/test_genotypes_stats_api.py, line 41 [r2] (raw file):} I don't think these names really make it any more clear what's going on—feel free to take them or leave them as you like.

Comments from the review on Reviewable.io

ihodes commented 9 years ago

Review status: 0 of 4 files reviewed, 5 unresolved discussions, some commit checks pending.

^{tests/python/test_genotypes_stats_api.py, line 40 [r1] (raw file):} That's what I'd thought! It turns out they do, though.

Comments from the review on Reviewable.io

hammerlab / cycledash

Fix impossible stats & regression test #690