samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
633 stars 241 forks source link

[merge] [view] Feature request: Ability to delete an unseen allele line if it duplicates an ALT-containing line #2195

Closed PlatonB closed 1 month ago

PlatonB commented 1 month ago

This is, in part, a follow-up to Issue #2023. The new options -m none,* and -A (-trim-unseen-alleles) don't allow to remove a line with <*> which is instead of allele. Related feature would allow the user to avoid developing extra scripts.

Command with -m none,* and -A:

bcftools merge -g path/to/GRCh38.d1.vd1.fa -m none,* -Ov path/to/*_gold.g.vcf.gz | bcftools view -Ov -A | grep -P '\b10056\b'

Real target lines:

chr1    10056   .       A       T       5.61    PASS    F       GT      0/0     0/0     0/0     0/0     0/0     ./.     0/0     0/0     0/0     0/0     0/0
chr1    10056   .       A       C       5.61    PASS    F       GT      ./.     ./.     0/0     ./.     ./.     0/0     ./.     ./.     ./.     ./.     ./.
chr1    10056   .       A       <*>     5.33    PASS    F       GT      ./.     ./.     ./.     ./.     ./.     0/0     ./.     ./.     ./.     ./.     ./.

Expected target lines:

chr1    10056   .       A       T       5.61    PASS    F       GT      0/0     0/0     0/0     0/0     0/0     ./.     0/0     0/0     0/0     0/0     0/0
chr1    10056   .       A       C       5.61    PASS    F       GT      ./.     ./.     0/0     ./.     ./.     0/0     ./.     ./.     ./.     ./.     ./.
PlatonB commented 1 month ago

If I remove -m none,* from the command, but keep -A, that also doesn't bring any effect: <*> remains.

bcftools merge -g /path/to/GRCh38.d1.vd1.fa -Ov /path/to/*g.vcf.gz | bcftools view -Ov -A | grep -P '\t907538\t'
chr1    907538  .       TA      TAA,<*>,T       241.6   LowQual BaseQRankSum=1.426;ExcessHet=3.0103;MQRankSum=-1.506;RAW_MQandDP=110443,65;ReadPosRankSum=-1.02;F;DP=233;MLEAC=1,.,.;MLEAF=0.5,.,.      GT:AD:DP:GQ:PL:SB:VAF:AF        0/1:22,19,0,0:41:99:249,0,305,676,676,676,676,676,676,676:5,17,6,13:.:.      0/1:39,19,0,0:62:12:12,0,18,990,990,990,990,990,990,990:.:0.306452,0,1.74137e-31:.      0/1:22,19,0,0:41:99:249,0,305,676,676,676,676,676,676,676:5,17,6,13:.:. 0/1:41,20,0,0:65:19:18,0,31,990,990,990,990,990,990,990:.:0.307692,0,7.13267e-28:.   0/3:2,0,0,10:25:0:5,990,990,990,990,990,0,990,990,18:.:.:0.4    0/1:8,8,0,0:16:98:98,0,110,255,255,255,255,255,255,255:4,4,4,4:.:.      0/1:38,8,0,0:51:9:9,0,14,990,990,990,990,990,990,990:.:0.156863,0,2.7209e-33:.       0/1:10,6,1,1:17:68:68,0,149,265,265,265,265,265,265,265:6,4,5,2:.:.     0/1:40,9,0,0:57:13:13,0,18,990,990,990,990,990,990,990:.:0.157895,0,6.96549e-31:.
pd3 commented 1 month ago

The documentation suggests the option should be given twice, have you tried that?

 -A, --trim-unseen-allele          Remove '<*>' or '<NON_REF>' at variant (-A) or at all (-AA) sites
PlatonB commented 1 month ago

-AA either does nothing, or only replaces <*> with ..

pd3 commented 1 month ago

Please try with the latest version of bcftools from github. Using that I was not able to reproduce the problem