samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
640 stars 241 forks source link

Can't filter custom VEP annotations which are float numbers #2098

Closed GACGAMA closed 5 months ago

GACGAMA commented 5 months ago

Hello! I'm using BCFTOOLS (from docker) - version 1.19 This issue is related to #1833 When filtering a float number within the CSQ columns of custom VEP annotations, I need to remove the tag from header, then apply the filter by using the -c option But this makes the column unparsable to missing values, which are of utter importance for fields such as gnomad_AF

Using: bcftools annotate -x INFO/gnomad_genomes_AF /scratch4/nsobrei2/ggama1/germline-tumor/vep/output_vcf_unfiltered_AI/TESTE_ENSEMBL_TUMOR.sSNV.unfiltered.vcf.gz | bcftools +split-vep -c gnomad_genomes_AF:Float -i '(gnomad_genomes_AF < 0.01)' Gives the correct result but eliminates any missing value.

Using bcftools annotate -x INFO/gnomad_genomes_AF /scratch4/nsobrei2/ggama1/germline-tumor/vep/output_vcf_unfiltered_AI/TESTE_ENSEMBL_TUMOR.sSNV.unfiltered.vcf.gz | bcftools +split-vep -c gnomad_genomes_AF:Float -i '(gnomad_genomes_AF < 0.01 || gnomad_genomes_AF="")' Gives Error: cannot use arithmetic operators to compare strings and numbers

And using: bcftools annotate -x INFO/gnomad_genomes_AF /scratch4/nsobrei2/ggama1/germline-tumor/vep/output_vcf_unfiltered_AI/TESTE_ENSEMBL_TUMOR.sSNV.unfiltered.vcf.gz | bcftools +split-vep -c gnomad_genomes_AF:Float -i '(gnomad_genomes_AF < 0.01 || gnomad_genomes_AF=".")' Gives Segmentation fault (core dumped)

Using: bcftools +split-vep -c gnomad_genomes_AF:Float -i '(gnomad_genomes_AF < 0.01 || gnomad_genomes_AF=".")' /scratch4/nsobrei2/ggama1/germline-tumor/vep/output_vcf_unfiltered_AI/TESTE_ENSEMBL_TUMOR.sSNV.unfiltered.vcf.gz Also gives Error: cannot use arithmetic operators to compare strings and numbers

Using: bcftools +split-vep -c gnomad_genomes_AF:String -i '( gnomad_genomes_AF=".")' /scratch4/nsobrei2/ggama1/germline-tumor/vep/output_vcf_unfiltered_AI/TESTE_ENSEMBL_TUMOR.sSNV.unfiltered.vcf.gz Outputs only . values

This only happens with VEP custom annotations.

pd3 commented 5 months ago

Thank you for the bug report, this is now fixed in github.

If upgrading is not an option (http://samtools.github.io/bcftools/howtos/install.html), you can use the following workarounds:

# 1. reverse the logic and use -e instead of -i, this way you don't have to query
#     the missing value (which was the broken part)
bcftools +split-vep -c gnomad_genomes_AF:Float -e   'gnomad_genomes_AF >= 0.01'

# 2. first add the columns, then filter in a separate command (which works)
bcftools +split-vep -c gnomad_genomes_AF:Float | 
     bcftools view -i  'gnomad_genomes_AF < 0.01 || gnomad_genomes_AF="."'