samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
662 stars 240 forks source link

Regexes always case-insensitive, and '/i' option not working #531

Closed sephraim closed 7 years ago

sephraim commented 7 years ago

Hello,

I noticed that when filtering with a regex, the regex is always case-insensitive even when I'm not using the /i option. For example, if I'm trying to count the number of "Pathogenic" variants from ClinVar:

bcftools query -i 'CLINVAR_CLNSIG ~ "Pathogenic"' -f '%CLINVAR_CLNSIG\n' my.vcf.gz \
   | sort | uniq -c | sort -bn

This produces:

 346 Likely_pathogenic
2046 Pathogenic

This means that "Pathogenic" is also matching "Likely_pathogenic". Using , ~ "pathogenic"' will also produce the same result.

Additionally, the documentation currently states that "expressions are case sensitive unless '/i' is added", but using ~ "Pathogenic/i"' or ~ "pathogenic/i"' does not produce any result.

Thank you very much for all your hard work on this.

pd3 commented 7 years ago

You are probably using an older version of bcftools? Please try with the latest version, as described here

sephraim commented 7 years ago

Thanks, @pd3. It was not working for the latest stable version (1.3.1), but it is working with the most recent dev version (1.3.1-201-g87456cf). Thank you for all your hard work.