samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
633 stars 241 forks source link

bcftools stats DP #2188

Closed fan040 closed 1 month ago

fan040 commented 1 month ago

image I want to know what the third column means and what the fourth and fifth columns have no value. image I want to know the meaning of this statistic, how is it calculated

pd3 commented 1 month ago

The column [3] is the bin size, i.e the depth with the default parameters; [4-5] are per-sample FORMAT values; and [6-7] are per-site INFO values.

By default the stats are collected only for sites, i.e. for INFO fields but not for FORMAT fields. That is why 4-5 are empty in your screenshot. With -s -, rather counterintuitively, the program parses also FORMAT fields. (This is an example of bad design, one would expect -s - switches sample stats off rather than on.)

For the definition of transitions and transversions see here https://en.wikipedia.org/wiki/Transversion and https://en.wikipedia.org/wiki/Transition_(genetics). The repeat consistent/inconsistent stats was experimental and useless and will be deprecated.

The commit 379e1b6 extends the description to make the file format self-descriptive.

I hope this helps.

fan040 commented 1 month ago

The column [3] is the bin size, i.e the depth with the default parameters; [4-5] are per-sample FORMAT values; and [6-7] are per-site INFO values.

By default the stats are collected only for sites, i.e. for INFO fields but not for FORMAT fields. That is why 4-5 are empty in your screenshot. With -s -, rather counterintuitively, the program parses also FORMAT fields. (This is an example of bad design, one would expect -s - switches sample stats off rather than on.)

For the definition of transitions and transversions see here https://en.wikipedia.org/wiki/Transversion and https://en.wikipedia.org/wiki/Transition_(genetics). The repeat consistent/inconsistent stats was experimental and useless and will be deprecated.

The commit 379e1b6 extends the description to make the file format self-descriptive.

I hope this helps.

I want to ask the size of the bin column refers to the size of the window, right? Is it the statistical result of the various data in 1bp?

pd3 commented 1 month ago

The bin size is by default equal to 1, meaning each bin number corresponds to the coverage. See also the option

 -d, --depth INT,INT,INT          Depth distribution: min,max,bin size [0,500,1]