samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
643 stars 240 forks source link

Bcftools merge - get "Incorrect number of GL fields (3) at chr1:12719, cannot merge." error. #540

Closed eddip closed 7 years ago

eddip commented 7 years ago

Hi there,

I'm looking to merge some VCF files but when I try I get the following error message.

"Incorrect number of GL fields (3) at chr1:12719, cannot merge."

Input command : bcftools merge OE_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz OI_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz -O z -o test_merge.vcf.gz

The format line for GL:

FORMAT=

The variant line highlighted by the error : chr1 12719 . G C 200 badReads;MQ BRF=0.93;FR=0.3689;HP=2;HapScore=2;MGOF=45;MMLQ=37;MQ=8.34;NF=8;NR=0;PP=200;QD=29.3464;SC=CAGTGT TGCAGAGGTGAGAGG;SbPval=1;Source=Platypus;TC=29;TCF=29;TCR=0;TR=8;WE=12727;WS=12709;ANNOVAR_DATE=2016-02-01;Func.refGene=ncRNA_exonic;Gene.refGene=DDX11L1;GeneDe tail.refGene=-999;ExonicFunc.refGene=-999;AAChange.refGene=-999;cytoBand=1p36.33;genomicSuperDups=Score\x3d0.993729\x3bName\x3dchr9:10843;ExAC_ALL=-999;ExAC_AFR =-999;ExAC_AMR=-999;ExAC_EAS=-999;ExAC_FIN=-999;ExAC_NFE=-999;ExAC_OTH=-999;ExAC_SAS =-999;ExAC_HIGH=-999;ExAC_HOMO=-999;VC_het_cnt=25;VC_hom_cnt=1;VC_freq=0.03 65;VC_Total_freq=0.0807018;VC_Total_variant_cnt=46;VC_Total_allele_cnt=570;esp6500siv2_all=-999;1000g2015aug_all=-999;clinvar_20150629=-999;UK10K=-999;SIFT_scor e=-999;SIFT_pred=-999;Polyphen2_HDIV_score=-999;Polyphen2_HDIV_pred=-999;Polyphen2_HVAR_score=-999;Polyphen2_HVAR_pred=-999;LRT_score=-999;LRT_pred=-999;Mutatio nTaster_score=-999;MutationTaster_pred=-999;MutationAssessor_score=-999;MutationAssessor_pred=-999;FATHMM_score=-999;FATHMM_pred=-999;PROVEAN_score=-999;PROVEAN _pred=-999;VEST3_score=-999;CADD_raw=-999;CADD_phred=-999;DANN_score=-999;fathmm-MKL_coding_score=-999;fathmm-MKL_coding_pred=-999;MetaSVM_score=-999;MetaSVM_pr ed=-999;MetaLR_score=-999;MetaLR_pred=-999;integrated_fitCons_score=-999;integrated_confidence_value=-999;GERP++_RS=-999;phyloP7wayvertebrate=-999;phyloP20way mammalian=-999;phastCons7way_vertebrate=-999;phastCons20way_mammalian=-999;SiPhy_29way_logOdds=-999;avsnp144=-999;VARIANT_TYPE=ncRNA;ALLELE_END GT:GL:GOF:GQ:NR: NV ./.:-4.2,-0,0:36:3:1:1 1/0:-1.79,0,-28.19:22:18:9:1 0/0:0,-0.3,-6.9:45:5:2:0 ./.:0,0,0:0:0:0:0 ./.:0,0,0:0:0:0:0 ./.:0,-0,-4.2:37 :3:1:0 1/0:-14.79,0,-6.39:40:64:6:4 1/0:-5.79,0,-20.59:22:58:8:2 0/0:0,-0.3,-7.9:5:5:2:0

and there are 3 values in the GL format field.

Is this an issue?

Hope you can help. Eddie

pd3 commented 7 years ago

Maybe the problem is in the other file, have you checked it? What is the output of this command?

 vcf1=OE_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz
 vcf2=OI_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz
 bcftools query -f'%REF %ALT [ %GL]\n'  $vcf1 -r chr1:12719
 bcftools query -f'%REF %ALT [ %GL]\n'  $vcf2 -r chr1:12719
eddip commented 7 years ago

Hi Petr,

$ bcftools query -f'%REF %ALT [ %GL]\n' OI_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz -r chr1:12719 G C -4.2,-0,0 -1.79,0,-28.19 0,-0.3,-6.9 0,0,0 0,0,0 0,-0,-4.2 -14.79,0,-6.39 -5.79,0,-20.59 0,-0.3,-7.9 $ bcftools query -f'%REF %ALT [ %GL]\n' OE_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz -r chr1:12719 $

The variant only exist in the OI file.

I would have expected the merge (if it had work) to produce a variant line in the output file, with ./.:.,.,:etc for the samples that were in the OE file.

In the OE file the variants also posses 3 values in the GL field.

bcftools_viewCommand=view OE_dedup_split.hg38_multianno.nonintergenic.nonintronic.vcf.gz

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 1713.bam 1714.bam 1715.bam

1716.bam 1734.bam chr1 13273 . G C 1264 MQ BRF=0.44;FR=0.4;HP=1;HapScore=2;MGOF=14;MMLQ=27;MQ=29.89;NF=22;N R=25;PP=1264;QD=27.6334;SC=TCCTGGACCAGTGATACACCC;SbPval=0.57;Source=Platypus;TC=186;TCF=88;TCR=98;TR=47;WE=13281;WS=1326 3;ANNOVAR_DATE=2016-02-01;Func.refGene=ncRNA_exonic;Gene.refGene=DDX11L1;GeneDetail.refGene=-999;ExonicFunc.refGene=-999 ;AAChange.refGene=-999;cytoBand=1p36.33;genomicSuperDups=Score\x3d0.993729\x3bName\x3dchr9:10843;ExAC_ALL=-999;ExAC_AFR= -999;ExAC_AMR=-999;ExAC_EAS=-999;ExAC_FIN=-999;ExAC_NFE=-999;ExAC_OTH=-999;ExAC_SAS =-999;ExAC_HIGH=-999;ExAC_HOMO=-999; VC_het_cnt=51;VC_hom_cnt=31;VC_freq=0.1527;VC_Total_freq=0.179487;VC_Total_variant_cnt=112;VC_Total_allele_cnt=624;esp65 00siv2_all=-999;1000g2015aug_all=0.0950479;clinvar_20150629=-999;UK10K=-999;SIFT_score=-999;SIFT_pred=-999;Polyphen2_HDI V_score=-999;Polyphen2_HDIV_pred=-999;Polyphen2_HVAR_score=-999;Polyphen2_HVAR_pred=-999;LRT_score=-999;LRT_pred=-999;Mu tationTaster_score=-999;MutationTaster_pred=-999;MutationAssessor_score=-999;MutationAssessor_pred=-999;FATHMM_score=-99 9;FATHMM_pred=-999;PROVEAN_score=-999;PROVEAN_pred=-999;VEST3_score=-999;CADD_raw=-999;CADD_phred=-999;DANN_score=-999;f athmm-MKL_coding_score=-999;fathmm-MKL_coding_pred=-999;MetaSVM_score=-999;MetaSVM_pred=-999;MetaLR_score=-999;MetaLR_pr ed=-999;integrated_fitCons_score=-999;integrated_confidence_value=-999;GERP++_RS=-999;phyloP7way_vertebrate=-999;phyloP2 0way_mammalian=-999;phastCons7way_vertebrate=-999;phastCons20way_mammalian=-999;SiPhy_29way_logOdds=-999;avsnp144=rs5317 30856;VARIANT_TYPE=ncRNA;ALLELE_END GT:GL:GOF:GQ:NR:NV 0/0:0,-12.61,-161:8:99:43:0 1/0:-36.58,0,-136.48:9:9 9:58:15 1/0:-22.07,0,-85.57:5:99:32:8 1/0:-28.08,0,-48.48:14:99:28:10 1/0:-46.08,0,-34.48:11:99:25:14

Eddie

pd3 commented 7 years ago

What is the GL tag definition in the header? Is it defined as Number=G? If you could share a small test case, that would be very helpful.

eddip commented 7 years ago

The format line for GL:

FORMAT=

I will look to extract a small test set.

thanks. Eddie

pd3 commented 7 years ago

No need to look further. GL must be defined as Number=G as it gives values for all genotypes, not alternate alleles. So three for biallelic sites, not one. If you fix the header, it should work.

eddip commented 7 years ago

Thanks Petr.

Changed header (for format GL line, from Number=A to Number=G) and the merge works.

Eddie