samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
671 stars 240 forks source link

bcftools consensus Applied 0 variants when VCF only has variants from one scaffold #1918

Closed gwct closed 1 year ago

gwct commented 1 year ago
bcftools --version
bcftools 1.17
Using htslib 1.17
Copyright (C) 2023 Genome Research Ltd.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

So, this is a weird one, which makes me think I'm doing something simple wrong and just can't see it.

I have a VCF that I've called SNPs on for only one scaffold (19). When I try to add these variants into the genome which contains all scaffolds with bcftools consensus, it doesn't work, with none of the variants being applied:

bcftools consensus -f mm39-chromes.fa -o test.fa -c test.chain snippet.vcf.gz
Note: the --sample option not given, applying all records regardless of the genotype
Applied 0 variants

The snippet.vcf.gz file is just the first few lines of a larger VCF, but I get the exact same behavior with the full VCF. Here are the contents of the snippet:

##fileformat=VCFv4.2
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
##FILTER=<ID=IndelGap,Description="Indel within 5 bp of an indel">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=pseudoit,Description="Set if true: MQ < 30.0 || FORMAT/DP < 5 || FORMAT/DP > 60 || ALT=\"*\"">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another; will always be heterozygous and is not intended to describe called alleles">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs --output /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/regions/mm39-19-30X-0.04d-0.005h.vcf.gz --include-non-variant-sites true --variant /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/regions/mm39-19-30X-0.04d-0.005h.gvcf.gz --reference /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/reference-genomes/mm39/Mus_musculus.GRCm39.dna.primary_assembly.chromes.fa --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-posteriors-to-calculate-qual false --dont-use-dragstr-priors false --use-new-qual-calculator true --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --genotype-assignment-method USE_PLS_TO_ASSIGN --genomicsdb-max-alternate-alleles 50 --call-genotypes false --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --genomicsdb-use-gcs-hdfs-connector false --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false --allow-old-rms-mapping-quality-annotation-data false",Version="4.3.0.0",Date="May 2, 2023 at 9:38:36 PM GMT-05:00">
##GATKCommandLine=<ID=SelectVariants,CommandLine="SelectVariants --output /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/mm39-30X-0.04d-0.005h-filtered-snps.vcf.gz --select-type-to-include SNP --select-type-to-exclude INDEL --select-type-to-exclude MIXED --select-type-to-exclude SYMBOLIC --variant /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/mm39-30X-0.04d-0.005h-filtered.vcf.gz --invertSelect false --exclude-non-variants false --exclude-filtered false --preserve-alleles false --remove-unused-alternates false --restrict-alleles-to ALL --keep-original-ac false --keep-original-dp false --mendelian-violation false --invert-mendelian-violation false --mendelian-violation-qual-threshold 0.0 --select-random-fraction 0.0 --remove-fraction-genotypes 0.0 --fully-decode false --max-indel-size 2147483647 --min-indel-size 0 --max-filtered-genotypes 2147483647 --min-filtered-genotypes 0 --max-fraction-filtered-genotypes 1.0 --min-fraction-filtered-genotypes 0.0 --max-nocall-number 2147483647 --max-nocall-fraction 1.0 --set-filtered-gt-to-nocall false --allow-nonoverlapping-command-line-samples false --suppress-reference-path false --fail-on-unsorted-genotype false --genomicsdb-max-alternate-alleles 50 --call-genotypes false --genomicsdb-use-bcf-codec false --genomicsdb-shared-posixfs-optimizations false --genomicsdb-use-gcs-hdfs-connector false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --max-variants-per-shard 0 --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false",Version="4.3.0.0",Date="May 2, 2023 at 10:18:04 PM GMT-05:00">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval">
##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity">
##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">
##bcftools_concatCommand=concat --threads 8 -Oz -o /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/mm39-30X-0.04d-0.005h.vcf.gz /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/regions/mm39-19-30X-0.04d-0.005h.vcf.gz; Date=Tue May  2 23:11:29 2023
##bcftools_concatVersion=1.16+htslib-1.16
##bcftools_filterCommand=filter -m+ -e 'MQ < 30.0 || FORMAT/DP < 5 || FORMAT/DP > 60 || ALT="*"' -s pseudoit --IndelGap 5 -Oz -o /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/mm39-30X-0.04d-0.005h-filtered.vcf.gz /n/holylfs05/LABS/informatics/Users/gthomas/Mapping-simulations-data/mm39-18-19-iterative/called-variants/gatk/30X/0.04/iter1/mm39-30X-0.04d-0.005h.vcf.gz; Date=Tue May  2 23:13:07 2023
##bcftools_filterVersion=1.16+htslib-1.16
##contig=<ID=1,length=195154279>
##contig=<ID=2,length=181755017>
##contig=<ID=3,length=159745316>
##contig=<ID=4,length=156860686>
##contig=<ID=5,length=151758149>
##contig=<ID=6,length=149588044>
##contig=<ID=7,length=144995196>
##contig=<ID=8,length=130127694>
##contig=<ID=9,length=124359700>
##contig=<ID=10,length=130530862>
##contig=<ID=11,length=121973369>
##contig=<ID=12,length=120092757>
##contig=<ID=13,length=120883175>
##contig=<ID=14,length=125139656>
##contig=<ID=15,length=104073951>
##contig=<ID=16,length=98008968>
##contig=<ID=17,length=95294699>
##contig=<ID=18,length=90720763>
##contig=<ID=19,length=61420004>
##contig=<ID=X,length=169476592>
##contig=<ID=Y,length=91455967>
##source=GenotypeGVCFs
##source=SelectVariants
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  mm39-30X-0.04d-0.005h
19      3050444 .       A       G       78.32   pseudoit        AC=2;AF=1;AN=2;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=40;QD=25.36;SOR=2.303 GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,2:2:6:1|1:3050444_A_G:90,6,0:3050444
19      3050446 .       G       A       78.32   pseudoit        AC=2;AF=1;AN=2;DP=2;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=40;QD=28.73;SOR=2.303 GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,2:2:6:1|1:3050444_A_G:90,6,0:3050444
19      3050468 .       G       T       141.14  pseudoit        AC=2;AF=1;AN=2;DP=4;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=30.97;SOR=3.258   GT:AD:DP:GQ:PL  1/1:0,4:4:12:155,12,0
19      3050487 .       T       C       251.02  PASS    AC=2;AF=1;AN=2;DP=7;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=27.24;SOR=4.174   GT:AD:DP:GQ:PL  1/1:0,7:7:21:265,21,0
19      3050502 .       A       G       346.04  PASS    AC=2;AF=1;AN=2;DP=8;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=28.2;SOR=4.407    GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,8:8:24:1|1:3050502_A_G:360,24,0:3050502
19      3050506 .       T       G       391.05  PASS    AC=2;AF=1;AN=2;DP=9;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=25;SOR=4.615      GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,9:9:27:1|1:3050502_A_G:405,27,0:3050502
19      3050516 .       C       A       436.06  PASS    AC=2;AF=1;AN=2;DP=10;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=29.56;SOR=4.804  GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,10:10:30:1|1:3050502_A_G:450,30,0:3050502
19      3050521 .       A       G       436.06  PASS    AC=2;AF=1;AN=2;DP=10;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=30.62;SOR=4.804  GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,10:10:30:1|1:3050502_A_G:450,30,0:3050502
19      3050534 .       G       A       220.64  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-0.628;DP=11;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=40;MQRankSum=0;QD=20.06;ReadPosRankSum=-0.091;SOR=1.27 GT:AD:DP:GQ:PL  0/1:4,7:11:99:228,0,119
19      3050535 .       T       C       453.06  PASS    AC=2;AF=1;AN=2;DP=11;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=28.17;SOR=4.977  GT:AD:DP:GQ:PL  1/1:0,11:11:33:467,33,0
19      3050564 .       T       C       438.06  PASS    AC=2;AF=1;AN=2;DP=13;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=40;QD=26.8;SOR=5.136   GT:AD:DP:GQ:PL  1/1:0,12:12:36:452,36,0
19      3050601 .       T       C       551.06  PASS    AC=2;AF=1;AN=2;DP=15;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=42.73;QD=26;SOR=5.549  GT:AD:DP:GQ:PL  1/1:0,15:15:45:565,45,0
19      3050642 .       T       G       953.06  PASS    AC=2;AF=1;AN=2;DP=23;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=52.3;QD=30.02;SOR=1.426        GT:AD:DP:GQ:PL  1/1:0,23:23:69:967,69,0
19      3050646 .       C       A       402.64  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-1.296;DP=24;ExcessHet=0;FS=1.685;MLEAC=1;MLEAF=0.5;MQ=52.5;MQRankSum=-1.318;QD=16.78;ReadPosRankSum=-1.554;SOR=0.346     GT:AD:DP:GQ:PL  0/1:10,14:24:99:410,0,310
19      3050654 .       C       T       1022.06 PASS    AC=2;AF=1;AN=2;DP=25;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=52.69;QD=31.98;SOR=1.136       GT:AD:DP:GQ:PL  1/1:0,25:25:75:1036,75,0
19      3050659 .       C       T       385.64  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-0.135;DP=25;ExcessHet=0;FS=6.373;MLEAC=1;MLEAF=0.5;MQ=53.31;MQRankSum=-0.565;QD=17.53;ReadPosRankSum=-1.682;SOR=0.707    GT:AD:DP:GQ:PL  0/1:10,12:22:99:393,0,315
19      3050748 .       T       C       1198.06 PASS    AC=2;AF=1;AN=2;DP=29;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=55.93;QD=27.51;SOR=1.382       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,28:28:84:1|1:3050748_T_C:1212,84,0:3050748
19      3050771 .       T       G       1089.06 PASS    AC=2;AF=1;AN=2;DP=25;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=54.64;QD=29.11;SOR=1.358       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,25:25:75:1|1:3050748_T_C:1103,75,0:3050748
19      3050815 .       T       G       775.06  PASS    AC=2;AF=1;AN=2;DP=18;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=48.73;QD=23.23;SOR=0.914       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,18:18:54:1|1:3050794_T_TA:789,54,0:3050794
19      3050855 .       A       G       1021.06 PASS    AC=2;AF=1;AN=2;DP=23;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=44.37;QD=29.03;SOR=0.963       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,23:23:69:1|1:3050855_A_G:1035,69,0:3050855
19      3050865 .       C       T       1055.06 PASS    AC=2;AF=1;AN=2;DP=24;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=44.44;QD=30.67;SOR=1.051       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,24:24:72:1|1:3050855_A_G:1069,72,0:3050855
19      3050892 .       C       T       1156.06 PASS    AC=2;AF=1;AN=2;DP=26;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=42.73;QD=26.61;SOR=1.022       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,26:26:78:1|1:3050892_C_T:1170,78,0:3050892
19      3050897 .       A       G       1156.06 PASS    AC=2;AF=1;AN=2;DP=26;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=42.73;QD=29.2;SOR=1.022        GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,26:26:78:1|1:3050892_C_T:1170,78,0:3050892
19      3050907 .       T       G       310.64  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=-1.66;DP=28;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=43.19;MQRankSum=-2.146;QD=11.09;ReadPosRankSum=-0.046;SOR=0.838 GT:AD:DP:GQ:PL  0/1:16,12:28:99:318,0,505
19      3050943 .       G       A       1284.06 PASS    AC=2;AF=1;AN=2;DP=30;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=42.87;QD=28.55;SOR=1.255       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,29:29:87:1|1:3050943_G_A:1298,87,0:3050943
19      3050950 .       T       C       1271.06 PASS    AC=2;AF=1;AN=2;DP=29;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=42.76;QD=34.42;SOR=1.255       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,29:29:87:1|1:3050943_G_A:1285,87,0:3050943
19      3050978 .       T       C       1097.06 PASS    AC=2;AF=1;AN=2;DP=26;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=41.09;QD=28.53;SOR=1.697       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,26:26:78:1|1:3050978_T_C:1111,78,0:3050978
19      3051002 .       G       T       1012.06 PASS    AC=2;AF=1;AN=2;DP=23;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=39.7;QD=27.23;SOR=1.708        GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,23:23:69:1|1:3050978_T_C:1026,69,0:3050978
19      3051006 .       T       C       326.64  PASS    AC=1;AF=0.5;AN=2;BaseQRankSum=0.628;DP=22;ExcessHet=0;FS=0;MLEAC=1;MLEAF=0.5;MQ=39.39;MQRankSum=1.28;QD=14.85;ReadPosRankSum=0.066;SOR=1.091    GT:AD:DP:GQ:PL  0/1:11,11:22:99:334,0,305
19      3051039 .       G       A       1003.06 PASS    AC=2;AF=1;AN=2;DP=23;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=38.68;QD=29.52;SOR=0.963       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,23:23:69:1|1:3051039_G_A:1017,69,0:3051039
19      3051042 .       A       G       1002.06 PASS    AC=2;AF=1;AN=2;DP=25;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=38.51;QD=33.47;SOR=0.776       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,23:23:69:1|1:3051039_G_A:1016,69,0:3051039
19      3051094 .       C       T       951.06  PASS    AC=2;AF=1;AN=2;DP=29;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=37.57;QD=33.97;SOR=1.179       GT:AD:DP:GQ:PL  1/1:0,28:28:84:965,84,0
19      3051108 .       T       C       1291.06 PASS    AC=2;AF=1;AN=2;DP=29;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=36.92;QD=32.91;SOR=1.255       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,29:29:87:1|1:3051108_T_C:1305,87,0:3051108
19      3051115 .       G       A       1336.06 PASS    AC=2;AF=1;AN=2;DP=30;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=36.64;QD=31.31;SOR=1.329       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,30:30:90:1|1:3051108_T_C:1350,90,0:3051108
19      3051116 .       T       C       1336.06 PASS    AC=2;AF=1;AN=2;DP=30;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=36.64;QD=31.45;SOR=1.329       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,30:30:90:1|1:3051108_T_C:1350,90,0:3051108
19      3051125 .       A       G       1246.06 PASS    AC=2;AF=1;AN=2;DP=28;ExcessHet=0;FS=0;MLEAC=2;MLEAF=1;MQ=36.56;QD=33.27;SOR=1.382       GT:AD:DP:GQ:PGT:PID:PL:PS       1|1:0,28:28:84:1|1:3051108_T_C:1260,84,0:3051108

Now for the even weirder quirks:

  1. If I remove any other scaffold from the reference genome other than 19 (or X or Y for some reason), it works.
  2. If I have even one SNP from another scaffold in the VCF, it works.
  3. If I try with a VCF that only has variants from scaffold 18, the behavior is reversed: it works when all scaffolds are present and applies 0 variants if a remove a random one. I haven't tried with SNPs from any other scaffold.

I'm at a loss figuring this one out. Is there something weird about my VCF file that I'm not seeing?? I generated it using HaplotypeCaller, and then did some filtering and selected only SNPs.

The reference genome I'm using is mouse mm39 (http://ftp.ensembl.org/pub/current_fasta/mus_musculus/dna/Mus_musculus.GRCm39.dna.primary_assembly.fa.gz), with only the full chromosomes though (1-19 + X and Y).

Any help would be appreciated, though ultimately I can just include 2 scaffolds in my simulations and it should work. Thanks!

pd3 commented 1 year ago

I can confirm there is some strange problem. With VCF none of the variant is applied, but when the VCF is converted to BCF, suddenly it works. Not sure at this stage if the problem is in htslib or in bcftools

pd3 commented 1 year ago

This is caused by an undesired difference in VCF vs BCF seek, see https://github.com/samtools/htslib/issues/1623

gwct commented 1 year ago

Great, thanks. Glad it wasn't something silly I was doing. I went ahead with a different chromosome which worked for my case, but good to know I can just use BCF as well until the fix is merged. Thanks for looking in to this!