samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
634 stars 241 forks source link

Bcftools view include based on multiple filters #2146

Closed SophieS9 closed 3 months ago

SophieS9 commented 3 months ago

Hi Bcftools team!

I'm looking to filter my VCF file for four separate FILTER terms (let's say A,B,C and D). I want to keep any combination of these filters: A B C D A,B A,B,C B,C A,D etc...

I've been having a play around with these filter options

FILTER="PASS"
FILTER="."
FILTER="A"          .. exact match, for example "A;B" does not pass
FILTER="A;B"        .. exact match, "A;B" and "B;A" pass, everything else fails
FILTER!="A"         .. exact match, for example "A;B" does pass
FILTER~"A"          .. subset match, for example both "A" and "A;B" pass
FILTER~"A;B"        .. subset match, pass only if both "A" and "B" are present
FILTER!~"A"         .. complement match, for example both "A" and "A;B" fail
FILTER!~"A;B"       .. complement match, fail if both "A" and "B" are present

But I can't seem to find anything that achieves this - any advice/options I'm missing? Running version 1.19.

Thanks!

pd3 commented 3 months ago

This should work if the filters A,B,C,D are allowed in the combination with others unlisted here, e.g. A,E

-i 'FILTER~"A" || FILTER~"B"  || FILTER~"C"  || FILTER~"D"'
SophieS9 commented 3 months ago

Thanks @pd3 but this pulls through variants with filter A/B/C/D in combination with other filters that I want to exclude. I want to keep solely those four in any combination, but exclude those four if they are in combination with anything else? E.g: A,B,C - include A - include E - exclude A,E - exclude

pd3 commented 3 months ago

Then use the reversed logic and do

-e 'FILTER~"E" || FILTER="PASS" || FILTER="."'
SophieS9 commented 3 months ago

Thanks @pd3. I can't seem to use them in a single command

Error: only one -i or -e expression can be given, and they cannot be combined

So assume you'd pipe them running the -i first and then the -e? With the -e, I'm going to need to be explicit about excluding all the other terms. Is there a way of saying "exclude anything that's not A/B/C/D or if A/B/C/D is in combination with something else?". Just so that I can avoid hard coding all of the filters I want to exclude if possible (I know what I want to keep, but not necessarily what I want to exclude)

pd3 commented 3 months ago

Yes, you'd have to pipe it through another instance if both -i and -e options have to be used. And no, I am afraid we have no way of saying "exclude anything that's not A/B/C/D or if A/B/C/D is in combination with something else".

SophieS9 commented 3 months ago

Thanks @pd3 super helpful! I'll close the issue as you've answered my questions! Thanks again!