samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
673 stars 240 forks source link

old version vcf files #623

Open KamilSJaron opened 7 years ago

KamilSJaron commented 7 years ago

It seems that bcftools stats do not work properly with vcf file v4.1 (version allowing structural variants).

It would be nice to mention in manual / readme / help page that bcftools do not work with structural variants.

It would be even nicer to implement support for SVs!

pd3 commented 7 years ago

Hi, can you be more specific about the support you'd like to have for SVs?

KamilSJaron commented 7 years ago

Hi, mainly I wanted to point out that current output is more than misleading (more detailed bellow).

I would like to see, at the count is different types of SVs for a start. Then it would be nice to see histograms of sizes per category (careful, basically every SV has a unique size, perhaps you could use bins for historgrams). I do not have multisample vcf file yet, so I am not sure how to sumarize that, but generally - to get a first glance what is inside.

Current behaviour: Now a have a vcf file and I would like to get a quick overview what is inside. I know (now), that there are 143490 breakends, 1390 deletions, 883 duplications, 893 insertions and 90 inversions.

bcftools stats reports instead of these number this :

# SN, Summary numbers:
# SN    [2]id   [3]key  [4]value
SN  0   number of samples:  1
SN  0   number of records:      146746
SN  0   number of no-ALTs:  0
SN  0   number of SNPs: 0
SN  0   number of MNPs: 1
SN  0   number of indels:   75478
SN  0   number of others:   1967
SN  0   number of multiallelic sites:   0
SN  0   number of multiallelic SNP sites:   0

These numbers seems to be wrong. I do not even get how they are computed. It also shows distribution of InDels, but it is not corresponding to fields INS, DEL, but BND sv type.

pd3 commented 7 years ago

The program only variants based on the REF and ALT field, small insertions and deletions. Structural variation is not supported at all at the moment. Simple stats like that should not be difficult to add, but this is unlikely to be added anytime soon, unless someone wants to contribute. Pull requests are welcome!