hammerlab / cycledash

Variant Caller Analysis Dashboard and Data Management System
Other
35 stars 2 forks source link

Make true pos count agree with true pos checks #825

Closed ihodes closed 9 years ago

ihodes commented 9 years ago

This means we don't use the sample name to test concordance, possibly leading to weird numbers when looking at multi-sample VCFs.

fixes #822

Review on Reviewable

ihodes commented 9 years ago

So I actually think we should get rid of this feature soon/replace it with a better, clearer concordance system. This wasn't really a bug (strictly speaking, I'd call it a bug); the two counts of true positives were both somewhat valid—one looked for a match on the sample name, as well, which prevents duplicate counts of true positives on multi-sample VCF comparisons. I've switched the other join to also only count true positives with the same sample name, for now. Ideally, we'll improve our comparison system a bit, but that'll take a lot of work (@jaclynperrone & I chatted about this some more today).

I didn't include a test, because I think this whole thing needs to be changed significantly. Will leave it to the reviewer if you think it needs a regression test anyway!.

armish commented 9 years ago

This looks good to me. I think as long as we make it clear how we calculate those values in the user documentation, either way should be fine. Reading your last comment, I thought that you changed another join to start matching with sample names, but this change set features a change in a single join statement. So just to make sure: you didn't forget to push another change out, right?

Also, just out of curiosity: how does the comparison numbers look after this change? Do we have 3 true positives and 11 false positives?

ihodes commented 9 years ago

Ah sorry, I wasn't clear. Here (https://github.com/hammerlab/cycledash/blob/master/cycledash/api/genotypes.py#L170) we already join on sample_name as well, so I just made sure both joins join on the sample_name in addition to the other fields.

armish commented 9 years ago

thanks for the clarification! Good to go, then :+1:

ihodes commented 9 years ago

Also, just out of curiosity: how does the comparison numbers look after this change? Do we have 3 true positives and 11 false positives?

0 true positives; as the VCF being compared to doesn't have a sample with the same name as the examined VCF (this is why this is a problematic way of calculating true/false/pos/negs.) Needs a serious rework in light of how this feature is likely to be used; we're punting on it until we know who needs to be using this feature next.