samtools / htslib

C library for high-throughput sequencing data formats
Other
803 stars 446 forks source link

bcf_has_filter #186

Open atks opened 9 years ago

atks commented 9 years ago

From the source, I notice that if FILTER is . and when PASS is queried, bcf_has_filter will return true. I understand that the documentation states that explicitly, but based on VCFv4.2 specification, it does not mention that . is equivalent to a PASS.

I think in this case, bcf_has_filter should return 0.

atks commented 8 years ago

hi, can I have a response here? I think the query is valid.

@pd3

pd3 commented 8 years ago

Hi, sorry that I missed your question. I don't mind if it is changed to distinguish between . and PASS. Bcftools will need to be updated if this happens.

atks commented 8 years ago

ok. I think we need more views from other people too.

@AlistairNWard @ekg

Is this a reasonable change?

Right now I am working around this by using bcf_has_filter() only when n_filter != 0.

AlistairNWard commented 8 years ago

I agree with @atks on this issue. I often perform filtering where a filter expression is provided and records passing the filter are marked with 'PASS'. I then sometimes want to extract the 'PASS' records, but currently get all records, since '.' counts as 'PASS'.

atks commented 8 years ago

In terms of semantics, bcf_has_filter() asks the question "is a particular filter present in the FILTER field". So if there are no filters in the first place, bcf_has_filter() should return false.

jmarshall commented 8 years ago

Triage meeting discussion: agreed that it should return false for .; but concern about behaviour change if we fix this..? Possibly an option like --include-dot/--exclude-dot or related to others' -f .,PASS

atks commented 8 years ago

I think the htslib API should be a reflection of the VCF spec. Concerns about behavourial changes I think are bcftools specific? "." should just mean no filters for the record.