jmonlong / manu-vgsv

https://jmonlong.github.io/manu-vgsv/
Other
9 stars 2 forks source link

differentiate 0/0 from ./. #51

Open glennhickey opened 5 years ago

glennhickey commented 5 years ago

Peter Audano is passionate about distinguishing 0/0 (reference) from ./. (non) calls in order to subset SVs when comparing across samples. This is a great point, but I don't think it will have much effect on the relative accuracies we're discussing in the paper. Still, it shouldn't be hard to add it in.

vg call already has a --trivial option that will output 0/0 calls among others. I'll re-run a sample with this enable to make sure it's still working. If it works we can do: 0/0 : ref call 0/1 , 1/0, 1/1, 1/2 etc : variant call anything else (including absence) : non-call

My only idea after that is to compute f1 on the ref-calls and variant-calls (like Peter), but then normalize based on the fraction of non-calls (but am very open to other ideas).

Some other questions that may be best answered by looking at one sample:

glennhickey commented 5 years ago

I'm looking at the vg call output with 0/0's. It works okay BUT the alts often don't come out. For example, there is a big complicated snarl that came from several overlapping variants in the VCF. There are dozens of possible alt-paths through the snarl, none with any read support. There is no way for the caller to know which ones were in the vcf, so it outputs something like this, with . in the ALT field:

chr16   13857   75714675_75755956   CCCCGTGTCTGTCACTGAAACCTTTTTTGTGGGAGACTATTCCTCCCATCTGCAACAGCTGCCC    .   8   PASS    DP=18;XSEE=75633103,75633104    GT:DP:XDP:AD:XADL:SB:XAAD   0/0:18:13,15:18:.:8,10:0

@jmonlong How difficult would it be to, in sveval or as a preprocessing step, go through these calls and generate 0/0 calls at the sites we're interested in where appropriate?

I really, really need (and should have done it earlier) to add an option to pass in the vcf we want to genotype as input to vg call, then make sure the output is framed only on calls in that vcf. It'll take at least a few days to implement and test, and I've been pushing it back because I'd rather do it as part of a bigger refactor (that gets rid of vg chunk) after we submit. Perhaps it's best just to start now?