zeeev / wham

Structural variant detection and association testing
Other
101 stars 25 forks source link

GATK compatible GLs in VCF output #16

Closed chapmanb closed 9 years ago

chapmanb commented 9 years ago

Zev; This is one more small fix for GATK compatible VCF output. GATK expects empty GL output to be just a . instead of .,.,.. The later causes it to die. With this fix everything looks smooth with both VCF output and recalling. Thanks again for all this work getting VCF up and running.

chapmanb commented 9 years ago

Zev; Brilliant, thanks for merging this in. I ran some additional tests and now have WHAM with mergeIndvs and genotyping working in bcbio. Here are the latest validations:

http://imgur.com/a/Gajsg

The only issue I ran into is with genotyping of inversions. I have a simple filter for tumor/normal (or case/control) samples that filters calls that are the same genotype in the background, so we can prioritize somatic calls. This works for duplications and deletions, but filters out almost all inversions as they regularly have the same genotype for tumor and normal. I turned off the filter of inversions to resolve this, but just as a heads up.

Thanks again for all the help. I'm excited to have the updated version with VCF support in bcbio.

zeeev commented 9 years ago

Brad,

Based on your benchmarks it is clear that inversions need some work. Is it possible for you to slice out a few inversions that are failing to genotype correctly so i can work with them?

In the case where both the normal and tumor had the non-reference genotype did the LID and RID support that the breakpoint is in both samples?

Can you send me the coordinates of the deletions I’m missing in the 450-2k range in NA12878?

I added inversions to the new version of wham last… i need to give them some TLC.

Thanks for everything,

Zev

Zev Kronenberg Ph.D. Phone: 208 629 6224

On Aug 27, 2015, at 3:27 AM, Brad Chapman notifications@github.com wrote:

Zev; Brilliant, thanks for merging this in. I ran some additional tests and now have WHAM with mergeIndvs and genotyping working in bcbio. Here are the latest validations:

http://imgur.com/a/Gajsg http://imgur.com/a/Gajsg The only issue I ran into is with genotyping of inversions. I have a simple filter for tumor/normal (or case/control) samples that filters calls that are the same genotype in the background, so we can prioritize somatic calls. This works for duplications and deletions, but filters out almost all inversions as they regularly have the same genotype for tumor and normal. I turned off the filter of inversions to resolve this, but just as a heads up.

Thanks again for all the help. I'm excited to have the updated version with VCF support in bcbio.

— Reply to this email directly or view it on GitHub https://github.com/jewmanchue/wham/pull/16#issuecomment-135374602.

chapmanb commented 9 years ago

Zev; Definitely happy to help. Instead of slicing things out, how about the whole set of calls and truth sets? Here's a tarball of the raw VCF calls and tumor-only filtered samples:

https://s3.amazonaws.com/bcbio/sveval/wham_syn4_calls.tar.gz

and the truth sets:

https://s3.amazonaws.com/bcbio_nextgen/dream/synthetic_challenge_set4_tumour_25pctmasked_truth.tar.gz

I'm not doing much fancy in the valildation, just overlapping the calls with the truth sets. Thanks much for looking at these.

zeeev commented 9 years ago

Brad,

Sorry, I wasn't clear. Could i get a few slices of tumor/nomal BAM files?

Somehow the reads are aligning to the alternative haplotype in the normal sample.

zeeev commented 9 years ago

I just updated the code. The sensitivity for inversions should be much better. I accidentally swapped a single variable. I'm also now always aligning both the + and revcomp read to the reference allele.

I joint called a set of simulated inversions and health genomes. I don't see accidental non-ref calls in the health genomes.

Please let me know if this fixes the sensitivity or genotyping issues.

chapmanb commented 9 years ago

Zev; Nice one, thank you. I re-validated this and it did fix all of the inversion sensitivity and genotyping issues; the new results look great:

http://imgur.com/a/Gajsg

Thanks again for all the help with this.

zeeev commented 9 years ago

That is great news. Do you plan on posting your benchmarks? Does the ensemble method now include Wham-Graphening?

chapmanb commented 9 years ago

Zev; I'm definitely planning on posting the results. The plan is to incorporate WHAM calls into the MetaSV ensemble method, try and improve MetaSV precision, then write up the new approach in bcbio with the updated callers. I can pass that along once it's all finished. Thanks again.

zeeev commented 9 years ago

Great, let me know if there is anyway I can help.

I have one more favor to ask. Can you change the name "wham" to "wham-g" or "wham-graphening" so people know it is different. Until the paper is published I cannot merge wham-g into wham.

--Zev