The pipeline inadvertently used an alternate reference genome for variant calling, but the resulting VCF file was labeled with the contig names from the user-specified reference. This misalignment causes a discrepancy where the CHROM column in the VCF does not match the contig names from the intended reference. Despite this, the pipeline consistently references the user-provided ID, leading to confusion as it implies that the user's chosen reference was employed throughout the process.
Command used and terminal output
No response
Relevant files
The vcf:
> less -S /nfs7/BPP/Chang_Lab/paradarc/paper2_bra/results/nxf_032224/graphtyper_vcfconcatenate/Bradyrhizobium_yuanmingense_GCF_000261785_1__CCBAU05623_By.vcf.gz | head -n 150
##fileformat=VCFv4.2
##fileDate=20240327
##source=Graphtyper
##graphtyperVersion=2.7
##graphtyperGitBranch=HEAD
##graphtyperSHA1=f9edcf4cf040197f57e208ff6f910e5a2875f692
##contig=<ID=NZ_CP028463.1,length=8288991>
##INFO=<ID=AAScore,Number=A,Type=Float,Description="Alternative allele confidence score in range [0.0,1.0]. The score is determined by a logistic regression model which was trained on GIAB truth data using other INFOs metrics as covariates.">
##INFO=<ID=ABHet,Number=1,Type=Float,Description="Allele Balance for heterozygouscalls (read count of call2/(call1+call2)) where the called genotype is call1/call2. -1 if no heterozygous calls.">
##INFO=<ID=ABHom,Number=1,Type=Float,Description="Allele Balance for homozygous calls(read count of A/(A+O)) where A is the called allele and O is anything else. -1 if no homozygous calls.">
##INFO=<ID=ABHetMulti,Number=R,Type=Float,Description="List of Allele Balance values for heterozygous calls (alt/(ref+alt)). -1 if not available.">
##INFO=<ID=ABHomMulti,Number=R,Type=Float,Description="List of Allele Balance values for homozygous calls (A/(A+0)) where A is the called allele and O is anything else. -1 if not available.">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Number of alternate alleles in called genotypes.">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele frequency.">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Number of alleles in called genotypes.">
##INFO=<ID=CR,Number=1,Type=Integer,Description="Number of clipped reads in the graph alignment.">
##INFO=<ID=CRal,Number=.,Type=String,Description="Number of clipped reads per allele.">
##INFO=<ID=CRalt,Number=A,Type=Float,Description="Percent of clipped reads per allele.">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of an SV.">
##INFO=<ID=FEATURE,Number=1,Type=String,Description="Gene feature.">
##INFO=<ID=GT_ANTI_HAPLOTYPE,Number=.,Type=String,Description="Haplotype string with downstream variants with no (or very low) evidence of being in the same haplotype. Used internally by Graphtyper.">
##INFO=<ID=GT_HAPLOTYPE,Number=.,Type=String,Description="Haplotype string with downstream variants with high evidence of being always in the same haplotype. Used internally by Graphtyper.">
##INFO=<ID=GT_ID,Number=.,Type=String,Description="ID for variant. Used internally by Graphtyper.">
##INFO=<ID=HOMSEQ,Number=.,Type=String,Description="Sequence of base pair identical homology at event breakpoints.">
##INFO=<ID=INV3,Number=0,Type=Flag,Description="Inversion breakends open 3' of reported location">
##INFO=<ID=INV5,Number=0,Type=Flag,Description="Inversion breakends open 5' of reported location">
##INFO=<ID=LEFT_SVINSSEQ,Number=.,Type=String,Description="Known left side of insertion for an insertion of unknown length.">
##INFO=<ID=LOGF,Number=1,Type=Float,Description="Output from logistic regression model.">
##INFO=<ID=MaxAAS,Number=A,Type=Integer,Description="Maximum alternative allele support per alt. allele.">
##INFO=<ID=MaxAASR,Number=A,Type=Float,Description="Maximum alternative allele support ratio per alt. allele.">
##INFO=<ID=MaxAltPP,Number=1,Type=Integer,Description="Maximum number of proper pairs support the alternative allele.">
##INFO=<ID=MMal,Number=.,Type=String,Description="Scaled mismatch count per allele.">
##INFO=<ID=MMalt,Number=A,Type=Float,Description="Mismatch percent per alternative allele.">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality.">
##INFO=<ID=MQalt,Number=A,Type=Integer,Description="Mapping qualities per alternative allele.">
##INFO=<ID=MQSal,Number=.,Type=String,Description="Sum of squared mapping qualities per allele.">
##INFO=<ID=MQsquared,Number=.,Type=String,Description="Sum of squared mapping qualities. Used to calculate MQ.">
##INFO=<ID=NCLUSTERS,Number=1,Type=Integer,Description="Number of SV candidates in cluster.">
##INFO=<ID=NGT,Number=3,Type=Integer,Description="Number of REF/REF, REF/ALT and ALT/ALTgenotypes, respectively.">
##INFO=<ID=NHet,Number=1,Type=Integer,Description="Number of heterozygous genotype calls.">
##INFO=<ID=NHomRef,Number=1,Type=Integer,Description="Number of homozygous reference genotype calls.">
##INFO=<ID=NHomAlt,Number=1,Type=Integer,Description="Number of homozygous alternative genotype calls.">
##INFO=<ID=NUM_MERGED_SVS,Number=1,Type=Integer,Description="Number of SVs merged.">
##INFO=<ID=OLD_VARIANT_ID,Number=1,Type=String,Description="Variant ID from a VCF (SVs only).">
##INFO=<ID=ORSTART,Number=1,Type=Integer,Description="Start coordinate of sequence origin.">
##INFO=<ID=OREND,Number=1,Type=Integer,Description="End coordinate of sequence origin.">
##INFO=<ID=QD,Number=1,Type=Float,Description="QUAL divided by NonReferenceSeqDepth.">
##INFO=<ID=QDalt,Number=A,Type=Float,Description="Simplified QD calculated separately for each allele against all other alleles.">
##INFO=<ID=PASS_AC,Number=A,Type=Integer,Description="Number of alternate alleles in called genotyped that have FT = PASS.">
##INFO=<ID=PASS_AN,Number=1,Type=Integer,Description="Number of genotype calls that haveFT = PASS.">
##INFO=<ID=PASS_ratio,Number=1,Type=Float,Description="Ratio of genotype calls that haveFT = PASS.">
##INFO=<ID=RefLen,Number=1,Type=Integer,Description="Length of the reference allele.">
##INFO=<ID=RELATED_SV_ID,Number=1,Type=Integer,Description="GraphTyper ID of a related SV.">
##INFO=<ID=RIGHT_SVINSSEQ,Number=.,Type=String,Description="Known right side of insertion for an insertion of unknown length.">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand bias (F/(F+R)) where F and R are forward and reverse strands, respectively. -1 if not available.">
##INFO=<ID=SBAlt,Number=1,Type=Float,Description="Strand bias of alternative alleles only. -1 if not available.">
##INFO=<ID=SBF,Number=R,Type=Integer,Description="Number of forward stranded reads per allele.">
##INFO=<ID=SBF1,Number=R,Type=Integer,Description="Number of first forward stranded reads per allele.">
##INFO=<ID=SBF2,Number=R,Type=Integer,Description="Number of second forward stranded reads per allele.">
##INFO=<ID=SBR,Number=R,Type=Integer,Description="Number of reverse stranded reads per allele.">
##INFO=<ID=SBR1,Number=R,Type=Integer,Description="Number of first reverse stranded reads per allele.">
##INFO=<ID=SBR2,Number=R,Type=Integer,Description="Number of second reverse stranded reads per allele.">
##INFO=<ID=SDal,Number=.,Type=String,Description="Score difference of AS and XS tags per allele.">
##INFO=<ID=SDalt,Number=A,Type=Float,Description="Avergae score difference of AS and XS tags per alternative allele.">
##INFO=<ID=SEQ,Number=1,Type=String,Description="Inserted sequence at variant site.">
##INFO=<ID=SeqDepth,Number=1,Type=Integer,Description="Total accumulated sequencing depth over all the samples.">
##INFO=<ID=SV_ID,Number=1,Type=Integer,Description="GraphTyper's ID on SV.">
##INFO=<ID=SVINSSEQ,Number=.,Type=String,Description="Sequence of insertion.">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Length of structural variant in bp. Negative lengths indicate a deletion.">
##INFO=<ID=SVMODEL,Number=1,Type=String,Description="Model used for SV genotyping.">
##INFO=<ID=SVSIZE,Number=1,Type=Integer,Description="Size of structural variant in bp. Always 50 or more.">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant.">
##INFO=<ID=VarType,Number=1,Type=String,Description="First letter is program identifier,the second letter is variant type.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="GenoType call. ./. is called if there is no coverage at the variant site.">
##FORMAT=<ID=FT,Number=1,Type=String,Description="Filter. PASS or FAILN where N is a number.">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed.">
##FORMAT=<ID=MD,Number=1,Type=Integer,Description="Read depth of multiple alleles.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth.">
##FORMAT=<ID=RA,Number=2,Type=Integer,Description="Total read depth of the reference allele and all alternative alleles, including reads that support more than one allele.">
##FORMAT=<ID=PP,Number=1,Type=Integer,Description="Number of reads that support non-reference haplotype that are proper pairs.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality.">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="PHRED-scaled genotype likelihoods.">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=LowAAScore,Description="Alternative alleles have a low score.">
##FILTER=<ID=LowABHet,Description="Allele balance of heterozygous carriers is below 17.5%.">
##FILTER=<ID=LowABHom,Description="Allele balance of homozygous carriers is below 90%.">
##FILTER=<ID=LowQD,Description="QD (quality by depth) is below 6.0.">
##FILTER=<ID=LowQUAL,Description="QUAL score is less than 10.">
##FILTER=<ID=LowPratio,Description="Ratio of PASSed calls was too low.">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GCF_000261785_1__CCBAU05623_By_10A2_2_S72 GCF_000261785_1__CCBAU05623_By_10K2_8_S213 GCF_000261785_1__CCBAU05623_By_11A5_5_S73 GCF_000261785_1__CCBAU05623_By_12A1_1_S74 GCF_000261785_1__CCBAU05623_By_12A2_2_S75 GCF_000261785_1__CCBAU05623_By_12F2_2_S108 GCF_000261785_1__CCBAU05623_By_13A3_4_S76 GCF_000261785_1__CCBAU05623_By_13A4_3_S77 GCF_000261785_1__CCBAU05623_By_14A2_2_S78 GCF_000261785_1__CCBAU05623_By_14A3_1_S79 GCF_000261785_1__CCBAU05623_By_15A3_2_S80 GCF_000261785_1__CCBAU05623_By_15K7_7_S221 GCF_000261785_1__CCBAU05623_By_16A1_2_S81 GCF_000261785_1__CCBAU05623_By_16A2_2_S82 GCF_000261785_1__CCBAU05623_By_17A3_1_S83 GCF_000261785_1__CCBAU05623_By_17A6_3_S84 GCF_000261785_1__CCBAU05623_By_18A2_2_S85 GCF_000261785_1__CCBAU05623_By_18A4_3_S86 GCF_000261785_1__CCBAU05623_By_19A1_4_S87 GCF_000261785_1__CCBAU05623_By_1A2_1_S60 GCF_000261785_1__CCBAU05623_By_20A1_6_S88 GCF_000261785_1__CCBAU05623_By_20A3_10_S89 GCF_000261785_1__CCBAU05623_By_20K3_10_S229 GCF_000261785_1__CCBAU05623_By_2A6_8_S61 GCF_000261785_1__CCBAU05623_By_5A2_2_S63 GCF_000261785_1__CCBAU05623_By_5A4_2_S64 GCF_000261785_1__CCBAU05623_By_6A5_9_S66 GCF_000261785_1__CCBAU05623_By_6K1_1_S207 GCF_000261785_1__CCBAU05623_By_7A4_7_S67 GCF_000261785_1__CCBAU05623_By_7A8_3_S68 GCF_000261785_1__CCBAU05623_By_8A1_2_S69 GCF_000261785_1__CCBAU05623_By_9A2_1_S70 GCF_000261785_1__CCBAU05623_By_9A4_8_S71
NZ_CP028463.1 28439 NZ_CP028463.1:28439:SG C G 5244 PASS AAScore=0.5081;ABHet=0.5625;ABHetMulti=0.5625,0.4375;ABHom=0.9946;ABHomMulti=-1,0.9946;AC=60;AF=0.9677;AN=62;CR=286;CRal=1230,76264;CRalt=20.1757;LOGF=0.02641;MMal=171,1786;MMalt=0.472487;MQ=60;MQSal=32400,1360800;MQalt=60;MQsquared=1882800;MaxAAS=23;MaxAASR=1;NHet=2;NHomAlt=29;NHomRef=2;PASS_AC=40;PASS_AN=40;PASS_ratio=0.6452;QD=17.66;QDalt=17.66;RefLen=1;SB=0.9251;SBAlt=0.9286;SBF=7,351;SBF1=4,173;SBF2=3,178;SBR=2,27;SBR1=1,15;SBR2=1,12;SDal=601,28956;SDalt=76.6032;SeqDepth=523;VarType=SG GT:AD:MD:DP:GQ:PL 1/1:0,18:1:19:54:255,54,0 1/1:0,10:10:20:30:138,30,0 1/1:0,14:5:19:42:196,42,0 1/1:1,7:1:9:12:75,12,0 1/1:0,21:3:24:63:255,63,0 1/1:0,3:4:7:9:36,9,0 0/1:5,9:0:14:18:69,0,18 1/1:0,14:4:18:42:211,42,0 1/1:0,6:4:10:18:90,18,0 1/1:0,15:2:17:45:211,45,0 1/1:0,10:6:16:30:151,30,0 ./.:0,0:0:0:0:0,0,0 1/1:0,18:3:21:54:255,54,0 1/1:0,17:6:23:51:255,51,0 1/1:0,16:5:21:48:232,48,0 1/1:0,6:8:14:18:87,18,0 1/1:0,6:1:7:18:87,18,0 1/1:0,13:10:23:39:193,39,0 1/1:0,11:3:14:33:151,33,0 1/1:0,17:7:24:51:255,51,0 1/1:0,14:3:17:42:247,42,0 1/1:0,10:2:12:30:151,30,0 1/1:0,9:5:14:27:135,27,0 1/1:0,15:2:17:45:247,45,0 ./.:0,0:0:0:0:0,0,0 1/1:0,23:6:29:69:255,69,0 1/1:0,22:4:26:66:255,66,0 1/1:0,8:6:14:24:129,24,0 0/1:2,0:4:6:9:9,0,15 1/1:0,12:3:15:36:166,36,0 1/1:0,9:5:14:27:117,27,0 1/1:0,14:10:24:42:205,42,0 1/1:1,11:3:15:24:126,24,0
Description of the bug
The pipeline inadvertently used an alternate reference genome for variant calling, but the resulting VCF file was labeled with the contig names from the user-specified reference. This misalignment causes a discrepancy where the CHROM column in the VCF does not match the contig names from the intended reference. Despite this, the pipeline consistently references the user-provided ID, leading to confusion as it implies that the user's chosen reference was employed throughout the process.
Command used and terminal output
No response
Relevant files
The vcf:
The user defined ref:
The reference picked by the pipeline:
System information
No response