samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
653 stars 240 forks source link

Filtering indels #1982

Closed lt11 closed 1 year ago

lt11 commented 1 year ago

Dear authors,

While I was filtering a vcf to grab only its indels I noticed that the output also reports "indels" that are longer than 50 bp. Is this an expected behaviour? Shouldn't variants longer than 50 bp be classified as "other"?

Thanks for your help.

Best regards,

Lorenzo

pd3 commented 1 year ago

No, it shouldn't, I don't see why, insertions and deletions can be of any length. The program classifies records as follows

ACGT A    .. deletion
A ACGT    .. insertion
A <INS>   .. other
A <DEL>   .. other

It is possible to filter by length for example like this

bcftools view -i 'abs(ILEN)<=50'
lt11 commented 1 year ago

Hi,

Thanks for the input. When insertions and deletions are larger than 50 bp they are usually classified as structural variants. The term "indel" is associated, in the literature, to events ≤ than 50 bp. It may be worth mentioning it in the documentation.

Best,

L

pd3 commented 1 year ago

There is no consensus on that actually. Arbitrary thresholds have been used, including much larger ones such as1000 bp.

I agree it would be good to have it documented though.