zeeev / wham

Structural variant detection and association testing
Other
101 stars 25 forks source link

ERROR: wham vcf as an input to MetaSV #20

Closed wjaratlerdsiri closed 8 years ago

wjaratlerdsiri commented 8 years ago

Hello Author,

I used wham VCF (--wham_vcf flag) with MetaSV and got the following error:

INFO 2015-10-13 07:47:59,983 metasv.sv_interval Loading SV intervals from Clean_FR07886681_hg19_MC_MQ_sort_dedup.realign.recal.wham.raw.class.vcf ERROR 2015-10-13 07:48:00,559 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=chr1, POS=9012117, REF=N, ALT=[GACCCTGTACCTTCTTCCTTTAGG]) ERROR 2015-10-13 07:48:00,560 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=chr1, POS=9012144, REF=N, ALT=[TTGAGATTANANGCGTAAGCCACCNNGNCNNGCCANANATNNATGATTTGAAAAGATGTTTNT]) ERROR 2015-10-13 07:48:00,560 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=chr1, POS=9015676, REF=N, ALT=[TGCTTTCAGCATCACTGACCTTGCACGCCTCCTTTTAAGGCCCTTC]) ERROR 2015-10-13 07:48:00,560 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=chr1, POS=9015722, REF=N, ALT=[ACGCCTCCTTTTAAGGCCCTTCCGTCCTTTTCTTGCAAGCAT]) ERROR 2015-10-13 07:48:00,561 metasv.sv_interval Ignoring record due to missing SVTYPE or INFO field in Record(CHROM=chr1, POS=9016688, REF=N, ALT=[CGCCGACCCCCGAGCTCTACACCTTGGNCTACCTTATTTCCCTCCACGACGCTCGTCCGATCTCCCAGNAGNNGGAGGNTGCAGNNAGCCNANGTCA]) ...

Are there any workarounds for this? Thankyou

James

zeeev commented 8 years ago

It looks like you have not run the classification step? The classification step adds the SVTYPE.

wjaratlerdsiri commented 8 years ago

Thanks Zeeev,

But I think I did it as follows:

./WHAM-BAM -f hg19_chromosome.fa \ -t A_sort_dedup.realigned.recalibrated.bam -x 4 > A_sort_dedup.realign.recal.wham.raw.vcf 2> A_sort_dedup.realign.recal.wham.raw.err

echo Next python /miniconda/wham/utils/classify_WHAM_vcf.py \ A_sort_dedup.realign.recal.wham.raw.vcf \ /wham/data/WHAM_training_data.txt > A_sort_dedup.realign.recal.wham.raw.class.vcf 2> A_sort_dedup.realign.recal.wham.raw.class.err

My VCF:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT A_sort_dedup.realigned.recalibrated.bam

chr1 9012117 . N GACCCTGTACCTTCTTCCTTTAGG . . LRT=0;WAF=.,0.500001,0.500001;GC=0,1;AT=1,0.0222222,0,0,0.0222222,0.111111,0,0.0666667,0.0888889,0,0,0.0444444,0,0,1.19378;CF=0;CISTART=9012072,9012160;CIEND=9011769,9011769;PU=2;SU=2;CU=8;RD=45;NC=2;MQ=57.8444;MQF=0.0666667;SP=0,2,0;CHR2=chr1;DI=f;END=9011770;SVLEN=346;WC=INR;WP=0.004,0.176,0.816,0.004 GT:GL:NR:NA:NS:RD 0/1:-129.636,-31.1916,-469.759:34:11:2:45 chr1 9012144 . N TTGAGATTANANGCGTAAGCCACCNNGNCNNGCCANANATNNATGATTTGAAAAGATGTTTNT . . LRT=0;WAF=.,0.500001,0.500001;GC=0,1;AT=1,0.0232558,0,0,0.0232558,0.116279,0,0.0697674,0.0930233,0,0,0.0465116,0,0,1.14072;CF=0;CISTART=9012114,9012172;CIEND=9012116,9012116;PU=4;SU=3;CU=11;RD=43;NC=4;MQ=57.7442;MQF=0.0697674;SP=2,0,0;CHR2=chr1;DI=b;END=9012117;SVLEN=26;WC=INR;WP=0.006,0.156,0.828,0.01 GT:GL:NR:NA:NS:RD 0/1:-115.82,-29.8053,-455.944:33:10:4:43

My err file: processing training file... Training weights for RandomForest classifier N = 14 training variables 0.0546112802958 0.0 0.0895275419308 0.14484642443 0.0803977781595 0.0722788483208 0.0741062190298 0.127521832359 0.000165751507091 0.0125475473299 0.0246638991969 0.104351587937 0.0796804855492 0.135300803955 results from cross validation: 94.715911% processing VCF file through classifier... ...running parent process with job id 2286 can use this ID to exit minclassfreq var is set to = 0.000000 ...classifier finished

Any helps would be appreciative.

Cheers, james

wjaratlerdsiri commented 8 years ago

This issue is still open! James

zeeev commented 8 years ago

Did you pass the classified VCF to Meta_sv?

wjaratlerdsiri commented 8 years ago

Yes. Even I ran again with the VCF file below. I got errors with missing SVTYPE. Only the X.class.vcf named "Clean_xxx_sort_dedup.realign.recal.wham.raw.class.vcf" in my work directory for MetaSV. My VCF has WC and WP added.

I think the problem is no SVTYPE clearly annotated in the INFO field like VCF files from other SV tools. What do you think?

I think MetaSV use SVTYPE field to select SVs for merging.

James

zeeev commented 8 years ago

James,

I'm not sure if Meta_SV was using WHAM or WHAM-GRAPHENING. WHAM-GRAPHENING is more accurate than WHAM, but only calls deletions, duplications and inversions. It also produces the SVTYPE info field.

You can convert the "WC" field to "SVTYPE" using the perl one liner below.

perl -lane '$_ =~ s/WC=/SVTYPE=/g' your.classified.vcf > your.classified.infomod.vcf

It is difficult for me to troubleshoot Meta_SV. Have you had positive experiences with Meta_SV?

--Zev

wjaratlerdsiri commented 8 years ago

Thanks Zev,

It works! Below is VCF results from MetaSV:

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT XXX

chr1 10288 . c . PASS CIEND=10135,10377;END=10584;SVLEN=89;SVTYPE=DUP;SVTOOL=MetaSVMerge;SOURCES=chr1-10178-10178-1-WHAM,chr1-10179-10178-0-WHAM,chr1-10230-10178-51-WHAM,chr1-10282-10256-25-WHAM,chr1-10287-10257-29-WHAM,chr1-10389-10438-49-Pindel,chr1-10395-10484-89-Pindel,chr1-10402-10451-49-Pindel;NUM_SVMETHODS=2;NUM_SVTOOLS=2;VT=SV;SVMETHOD=RP,SR;AT=0.881119,0.321678,0.00699301,0.027972,0.174825,0.132867,0.0979021,0.013986,0.041958,0.0629371,0.223776,0.300699,0.0769231,0.13986,3.79357;CF=0.776224;CHR2=chr1;CISTART=10232,10340;CU=80;DI=f;GC=0,1;LRT=0.0;MQ=17.4825;MQF=0.972028;NC=4;PU=6;RD=143;SP=42,0,7;SU=0;WAF=.,1.0,1.0;WP=0.02,0.428,0.394,0.158 GT 1/1

WHAM results were merged with other tools.

I do not know what SV software is the best, but MetaSV provides me what I want. I need to merge SV calls based on different SV signals and use de novo genome assembly to confirm breakends. MetaSV and Parliament can give me that, but I prefer the former because I have to pay to run Parliament in the DNAnexus.

Many thanks, James

zeeev commented 8 years ago

Great. Let me know if you run into anymore problems.