Make sure VCF qualities made by vg call make sense

glennhickey commented 8 years ago

Qualities (derived from allele likelihoods) got a bit left by the wayside since using just read support always seemed better at correlating with true calls in our tests for vg call. Need to go back and make sure they are not nonsense, or just replace with read support in default output, or someone will get confused when using them eventually. They also seems to be unstable at tiny chunk sizes when using the chunked_call script which needs a look.

bricoletc commented 5 years ago

Hi, I've run vg release Tufo, construct, map, augment, call. I'm now looking to see how call accuracy changes according to filtering using some measure of genotype likelihood. From the vcf output information i can think of using:

In FORMAT: XADL "Likelihood of allelic depths for called alleles". Problem: I see some calls with a "." there. Eg 1/1:98:189,42:13,85:.:3,10,23,62:85 where FORMAT is GT:DP:XDP:AD:XADL:SB:XAAD. How to explain this? I've also seen value of 0.
QUAL. How is QUAL computed; is it from XADL?
FILTER: for eg, lowad where allele read support is <5

Is there a recommended strategy for getting high quality calls only? And when you say:

using just read support always seemed better at correlating with true calls in our tests for vg call

what exactly does this mean for this kind of filtering?

Thanks for your tool!

iqbal-lab commented 5 years ago

Any advice @ekg ? anyone?

glennhickey commented 5 years ago

The XADL comes from relative allele depths, and only gets computed when there are two or more alleles found at a site.

The QUAL field is basically a quality-adjusted support. For each read, the mapping and base qualities are combined, and this value is totalled up for all the reads.

The allele depth (AD) field has often produced the best ROC curves.

One day, we will add proper genotype qualities. There's some ongoing work concerning scaling that needs to get finished (hopefully soon) before that can be practical though.

Hope this helps!

On Wed, May 29, 2019 at 12:07 PM Brice Letcher notifications@github.com wrote:

Hi, I've run vg release Tufo https://github.com/vgteam/vg/releases/tag/v1.15.0, construct, map, augment, call. I'm now looking to see how call accuracy changes according to filtering using some measure of genotype likelihood. From the vcf output information i can think of using:

In FORMAT: XADL "Likelihood of allelic depths for called alleles". Problem: I see some calls with a "." there. Eg 1/1:98:189,42:13,85:.:3,10,23,62:85 where FORMAT is GT:DP:XDP:AD:XADL:SB:XAAD. How to explain this? I've also seen value of 0.

QUAL. How is QUAL computed; is it from XADL?

FILTER: for eg, lowad where allele read support is <5

Is there a recommended strategy for getting high quality calls only? And when you say:

using just read support always seemed better at correlating with true calls in our tests for vg call

what exactly does this mean for this kind of filtering?

Thanks for your tool!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/484?email_source=notifications&email_token=AAG373R3TXEW3KOKOWQ7LPLPX2S4HA5CNFSM4CQVJ26KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWP2QWA#issuecomment-497002584, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG373WRTVPIXP4KIN4I7ATPX2S4HANCNFSM4CQVJ26A .

bricoletc commented 5 years ago

Hi Glenn,

Thanks for the reply. I will work with AD for now. Looking forward to more on this subject!

Best

ekg commented 5 years ago

A heads up to Glenn, I think we can consider pulling in the freebayes genotyping code. It will probably need to be forked to fit. But it will provide a bunch of generality (e.g. ploidy, pooled calling) and also genotype qualities.

On Sat, Jun 8, 2019, 00:22 Brice Letcher notifications@github.com wrote:

Hi Glenn,

Thanks for the reply. I will work with AD for now. Looking forward to more on this subject!

Best

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/484?email_source=notifications&email_token=AABDQEJJPWWCY7Q2FZ3UWSLPZLGSDA5CNFSM4CQVJ26KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXHA4HQ#issuecomment-500043294, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQENTW7DPDYZOZYZF2F3PZLGSDANCNFSM4CQVJ26A .

vgteam / vg

Make sure VCF qualities made by vg call make sense #484