The intention here is to provide a more flexible, composable alternative to the existing count_episodes_for_patient aggregation.
The idea is that given a date event series like this:
date
2020-01-01
2020-01-04
2020-01-06
2020-01-10
2020-01-12
We could "cluster" it into episodes, defining the maximum gap allowed between episodes like so:
date.cluster_by_episode(days(3))
And this would produce a new event frame with two columns giving the start and end dates of each episode:
start_date
end_date
2020-01-01
2020-01-06
2020-01-10
2020-01-12
Calling count_for_patient() on this frame would give the same result as the existing count_episodes_for_patient() aggregation. But all sorts of other operation are now possible e.g. calculating total episode length and including only those over a certain length:
This is a new kind of operation in the sense that it takes a series and returns a frame. It's also unusual in that the frame class it returns is not one already defined in a schema module. In the query_language module we'll probably want to define a dedicated frame class for this with start_date and end_date series.
Note that this operation is not an aggregation (it doesn't result in a one-row-per-patient result) so we shouldn't use the _for_patient() suffix in naming it.
I don't have a clear sense of how fiddly or not this will be to implement in SQL. The existing episode counting code uses a LAG window function to do it's work. It may be that a simple adaption of this code will do what we need. Or there may turn out to be some significant blocker I haven't thought of.
The intention here is to provide a more flexible, composable alternative to the existing
count_episodes_for_patient
aggregation.The idea is that given a date event series like this:
date
We could "cluster" it into episodes, defining the maximum gap allowed between episodes like so:
And this would produce a new event frame with two columns giving the start and end dates of each episode:
start_date
end_date
Calling
count_for_patient()
on this frame would give the same result as the existingcount_episodes_for_patient()
aggregation. But all sorts of other operation are now possible e.g. calculating total episode length and including only those over a certain length:Original Slack discussion with motivating use cases: https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1729785637716699
Implementation notes
This is a new kind of operation in the sense that it takes a series and returns a frame. It's also unusual in that the frame class it returns is not one already defined in a schema module. In the
query_language
module we'll probably want to define a dedicated frame class for this withstart_date
andend_date
series.Note that this operation is not an aggregation (it doesn't result in a one-row-per-patient result) so we shouldn't use the
_for_patient()
suffix in naming it.I don't have a clear sense of how fiddly or not this will be to implement in SQL. The existing episode counting code uses a
LAG
window function to do it's work. It may be that a simple adaption of this code will do what we need. Or there may turn out to be some significant blocker I haven't thought of.