iqbal-lab / cortex

reference free variant assembly
32 stars 13 forks source link

wrong ref allele given in vcf of calls #14

Open rmcolq opened 8 years ago

rmcolq commented 8 years ago

When running the indep pipeline to create a massive vcf of variants in a groups of samples, get errors from bcf merge step. Here are some examples of pairs of samples which have conflicting ref alleles given in the raw vcf:

data2/users/phelim/ana/staph/cortex/results/C00001083/vcfs/C00001083_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 616858 UNION_BC_k31_var_12940 T A . PASS KMER=31;SVLEN=0;SVTYPE=SNP GT:COV:GT_CONF 1/1:0,209:121.99 /data2/users/phelim/ana/staph/cortex/results/C00001085/vcfs/C00001085_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 616858 UNION_BC_k31_var_9908 AGAT AAAC . PASS KMER=31;SVLEN=0;SVTYPE=PH_SNPS GT:COV:GT_CONF 0/1:21,125:5.28 [correct ref allele is T]

cortex/results/nctc/NCTC5655/vcfs/NCTC5655_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 1356994 UNION_BC_k31_var_778 G GA . PASS KMER=31;PV=6;SVLEN=1;SVTYPE=INS GT:COV:GT_CONF ./.:0,0:0.50 cortex/results/nctc/NCTC7972/vcfs/NCTC7972_wk_flow_I_RefCC_FINALcombined_BC_calls_at_all_k.raw.vcf R00000022 1356994 UNION_BC_k31_var_8861 T TA . PASS KMER=31;PV=6;SVLEN=1;SVTYPE=INS GT:COV:GT_CONF 0/1:7,1:3.27 [ the correct ref allele is T]

Perhaps need a script to filter sites at same location with different ref allele prefixes at the same time as running the scripts to remove duplicates and label overlaps.