samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
657 stars 240 forks source link

How does bcftools isec identify the same sites #2230

Open ym-chen opened 2 months ago

ym-chen commented 2 months ago

I used bcftools isec to find common sites in multiple vcf files. But I found a confused site. The records from vcfs are:

vcf1 chr3 183987815 . G A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=152,144|24,25;DP=351;ECNT=1;GERMQ=93;MBQ=38,31;MFRL=288,304;MMQ=60,60;MPOS=43;NALOD=2.37;NLOD=68.24;POPAF=6;TLOD=129.08 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:236,1:0.00432:237:114,0:98,0:234,1:118,118,0,1 0/1:60,48:0.448:108:28,23:26,21:58,47:34,26,24,24 vcf2 chr3 183987815 . G T . PASS AS_FilterStatus=SITE;AS_SB_TABLE=210,249|22,22;DP=507;ECNT=1;GERMQ=93;MBQ=38,33;MFRL=292,282;MMQ=60,60;MPOS=39;NALOD=2.31;NLOD=60.38;POPAF=6;TLOD=100.18 GT:AD:AF:DP:F1R2:F2R1:FAD:SB 0/0:203,0:0.004889:203:109,0:88,0:202,0:85,118,0,0 0/1:256,44:0.145:300:119,23:127,21:254,44:125,131,22,22

Even though the alts of these two sites are not the same, isec still considers these to be a common site. It's not what I expected. I wonder why isec thinks these two sites are the same.

keenhl commented 2 months ago

Try using the collapse option

-c, --collapse snps|indels|both|all|some|none|id Controls how to treat records with duplicate positions and defines compatible records across multiple input files. Here by "compatible" we mean records which should be considered as identical by the tools. For example, when performing line intersections, the desire may be to consider as identical all sites with matching positions (bcftools isec -c all), or only sites with matching variant type (bcftools isec -c snps -c indels), or only sites with all alleles identical (bcftools isec -c none).

none only records with identical REF and ALT alleles are compatible