samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
640 stars 241 forks source link

Updating filter syntax and threshold ? #2101

Open davidyuyuan opened 5 months ago

davidyuyuan commented 5 months ago

I could be very wrong. The syntax and threshold values in the example might be outdated in Filtering variants on https://samtools.github.io/bcftools/howtos/variant-calling.html:

bcftools filter -sLowQual -g3 -G10 \
    -e'%QUAL<10 || (RPB<0.1 && %QUAL<15) || (AC<2 && %QUAL<15) || %MAX(DV)<=3 || %MAX(DV)/%MAX(DP)<=0.3' \
    calls.vcf.gz

I am using the following on bcftools_filterVersion=1.18+htslib-1.18 with a public GRCh38 dataset from PacBio:

bcftools filter -sLowQual -g3 -G10 \
    -e 'QUAL<100 || (RPBZ<0.1 && QUAL<150) || (AC<2 && QUAL<150) || VDB<1.0e-04' \
    "${input_vcf}" -Oz -o "${output_dir}/filtered.vcf.gz" --write-index

So might be the other examples in "Filtering variants".

P.S. I am going to try the filter on the G1K dataset next to understand if the filter can be generic enough.

pd3 commented 5 months ago

The thresholds are extremely unlikely to work as is for different datasets, that's certain. The example is intended as an illustration only.