samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

Feature Request: Allow for matching on INFO fields using annotate #2151

Closed ejgardner-insmed closed 2 months ago

ejgardner-insmed commented 3 months ago

Hello,

Currently, annotate only allows for matching additional fields with the '~' operator for the ID and POS columns. I was wondering if it was possible to allow for matching on additional INFO fields? As an example, I have an annotation that is transcript-specific. Thus a single variant sometimes has two scores, one for the 1st overlapping transcript, the second (or more) for the nth transcript (tsv format):

CHROM POS REF ALT SCORE ENST
chr1 10 A T 0.1 ENST1
chr1 10 A T 0.4 ENST2

and I have a variant that is annotated to intersect the 1st transcript (vcf format):

#CHROM POS ID REF ALT FILTER INFO
chr1 10 . A T . PASS ENST=ENST1

Thus, when running a command like (note the '~'):

bcftools annotate -o annotated.vcf -a score.tsv.gz -c 'CHROM,POS,REF,ALT,SCORE,~ENST' input.vcf

I would expect the annotation to be:

chr1 10 . A T . PASS ENST=ENST1;SCORE=0.1

I hope this makes sense!

pd3 commented 2 months ago

I just added the feature. It should be now possible to do

bcftools annotate -o annotated.vcf -a score.tsv.gz \
      -c CHROM,POS,REF,ALT,SCORE,ENST -i'ENST={ENST}' -k input.vcf

The option -k is required if all sites should be printed, even the ones that did not match the expression, and therefore were not modified.

The above command implicitly matches REF,ALT. If that's not desired, one can run as

bcftools annotate -o annotated.vcf -a score.tsv.gz \
       -c CHROM,POS,-,-,SCORE,ENST -i'ENST={ENST}' -k input.vcf

Please try it out