Open evansd opened 11 months ago
@inglesp Has suggested it might be possible to do this purely in the query language by using where()
as syntactic sugar for building case
expressions.
I think this would involve adding a _conditions
list to PatientFrame
and each call to where()
creates a new frame with the new condition appended to this list. Then accessing column attributes on this frame would return a case()
expression which took the column value if all the conditions are met and NULL otherwise.
The desired behaviour here should be pretty self-explanatory. We want queries like this to work in the obvious way:
The reason for wanting it is ergonomic: it's possible to build logically equivalent queries using
case
expressions, for example:But that requires repeating the condition, and doesn't allow combining multiple conditions by chaining. It's also generally better to allow users to achieve logically equivalent things using the same mechanisms as far as possible, rather than requiring them to do things one way in one context and another way in a different context.
Technically, I think this is going to be medium difficulty. Many parts of it will be very straightforward but updating
BaseSQLQueryEngine
to support this might involve some slightly mind-stretching reasoning.Initial discussion in Slack thread: https://bennettoxford.slack.com/archives/C03FB777L1M/p1700050532117739
Implementation notes
Changing the ehrQL surface and the query model rules should be fairly easy. We'd need to move the
where()
andexcept_where()
methods fromEventFrame
toBaseFrame
: https://github.com/opensafely-core/ehrql/blob/4add3ee29148205bf53a61a517ef63bbd0d1688f/ehrql/query_language.py#L1028-L1056And change
Filter
to accept and returnFrame
rather thanManyRowsPerPatientFrame
: https://github.com/opensafely-core/ehrql/blob/4add3ee29148205bf53a61a517ef63bbd0d1688f/ehrql/query_model/nodes.py#L233-L235And then extend the spec tests to cover some
PatientFrame
examples: https://github.com/opensafely-core/ehrql/blob/4add3ee29148205bf53a61a517ef63bbd0d1688f/tests/spec/filter/test_where.py#L7I think the
InMemoryQueryEngine
may work out of the box here: it seems to havefilter
methods already defined in all the relevant places. If not, it should be reasonably straightforward to extend.We'll also need to extend the generative tests so that they produce examples of
Filter
being applied to aOneRowPerPatientFrame
: https://github.com/opensafely-core/ehrql/blob/4add3ee29148205bf53a61a517ef63bbd0d1688f/tests/generative/variable_strategies.py#L448-L467The challenge will come with
BaseSQLQueryEngine
. I haven't quite got my head around what changes will be needed here. But it's going to involve looking at theget_select_query_for_node_domain
method and the various places that gets called:https://github.com/opensafely-core/ehrql/blob/4add3ee29148205bf53a61a517ef63bbd0d1688f/ehrql/query_engines/base_sql.py#L702-L706