Open wjchulme opened 3 years ago
Thinking long term here (and might be thinking beyond the current use of opensafely)...what about if someone is doing a study using data from before the latest build date (i.e. say they were conducting a study between 2000 and 2010), would not being able to specify a date cause issues here (i.e. would they get a date after their study period?)
I guess if you're doing a historical study then this isn't the function you'd want to use anyway. (And a historical study would be impossible in EMIS as it stands as we don't have historical registration data.)
@wjchulme, can you explain the use-case a bit more? i.e. why fixing patients.registered_practice_as_of()
so it respects date wouldn't be the right fix?
I'm wondering why you'd want to know the date of a last active registration for a deregistered patient.
I'm not doubting it's useful, but thinking forward to helpful documentation - when should I use patients.registered_practice_as_of()
and when should I use patients.latest_active_registration()
- might this be a loaded gun for people who might not have thought carefully about what they want?
why you'd want to know the date of a last active registration for a deregistered patient.
It's important to know when people were deregistered so we know when we're going to stop seeing any recorded primary care events. This effectively ends a person's "follow-up" in a study, as they're no longer "at-risk" of experiencing a primary care coded event.
why fixing
patients.registered_practice_as_of()
so it respects date wouldn't be the right fix?
I'm not sure what that fix would look like. One issue is we don't have practice history in EMIS, we only know their most recent practice as of the latest EMIS build date. So currently the only way to get the query to respect the date is to return an error if it's run on any other date than the build date. Even then we can't easily access and use the build date, for instance with patients.registered_practice_as_of(date="latest_build_date")
. Perhaps we can add a feature to tell the function to run it on the build date without having to update each time, and it chooses the right build date for the backend. Maybe "latest_build_date"
could be a reserved variable name? or we could do something like patients.registered_practice_as_of(use_build_date="True")
?
The other thing is if a patient is not registered on "latest_build_date"
, then TPP will return ""
but EMIS will return the practice a patient was last registered with, so there's still an inconsistency that needs resolving.
when should I use
patients.registered_practice_as_of()
and when should I usepatients.latest_active_registration()
patients.registered_practice_as_of()
in TPP when you don't need consistency with EMIS and you want more control over what date the practice ID is based on. latest_active_registration()
whenever you need to make sure the derivation of practice ID is consistent between backends. latest_active_registration()
or registered_practice_as_of()
in EMIS will return the same thing, but using latest_active_registration()
makes it clearer about exactly what is being returned (so basically, never use patients.registered_practice_as_of()
in EMIS!). On reflection, maybe it would be better to call the new variable something like (but better). But
I considered calling it something like system_consistent_practice()
to make clear when it should be used, but latest_active_registration()
is more descriptive, and is still a useful name even if/when we have registration history in EMIS
why fixing patients.registered_practice_as_of() so it respects date wouldn't be the right fix?
I'm not sure what that fix would look like.
It would look like us getting EMIS to add a registration history table to the database available to us.
It would look like us getting EMIS to add a registration history table to the database available to us.
:) That would be great, but my understanding from @inglesp is that this won't happen imminently. In the meantime, we need a way to derive practice ID consistently between backends.
The workaround without backend changes is this:
dereg_date = patients.date_deregistered_from_all_supported_practices(
on_or_before = "latest_build_date" # "today" would probably be good enough here if that is still supported?
)
last_active_registration_date = mininum_of(
"dereg_date",
"latest_build_date"
) # does minimum_of work for dates?
practice_ID = patients.registered_practice_as_of(
date = "last_active_registration_date - 1 day",
returning="pseudo_id"
)
If run in EMIS, it will just use patients.registered_practice_as_of()
, ingoring date
. If run in TPP it will use date
to match behaviour in EMIS. I think!
Ah, OK, now I get it!
There's no particular reason for current EMIS behaviour; it doesn't reflect any particular truth about the underlying data. So we should avoid, if possible, adding functions to our official API specifically to address the way the data happens to be exposed in a build that has been made specifically for us. Otherwise we run the risk I outlined above, I think.
What we do need to address is the fact that people are starting to use the EMIS backend at a time when we've only specifically addressed and/or thought about support for the vaccination coverage report. For example, if the EMIS backend currently ignores date
this should at least be documented, somehow. I presume this is for @acagreen17's current factors-associated-with work?
If we can use a workaround we should do for now, rather than address it with specific, new functions.
This also relates to stuff @evansd has been thinking about; it would (as I understand it) make it easier for you to make a reusable, user-defined variable that encapsulates the logic above.
I presume this is for @acagreen17's current factors-associated-with work?
Yes, although long term I think the plan will be to try and run analyses in both TPP and EMIS and combine results. Hence why we were thinking a new function might be better than a workaround.
@wjchulme, @inglesp are you happy for me to implement the workaround to derive practice ID consistently between backends?
@sebbacon OK, thanks, that makes sense.
@acagreen17 I don't think the proposed workaround will actually work! I don't think minimum_of()
works for dates, and there's no coalesce
or case_when
equivalents in study definitions to pick out the correct non-null practice ID. The only way I can think to get the right practice ID is to extract two IDs and select the right one in a subsequent script. Am I missing a trick?
As in use
practice_ID_latest_date = patients.registered_practice_as_of(
date = "latest_build_date",
returning="pseudo_id"
)
Which would give us a practice id for all EMIS patients but only a practice id for current registered TPP patients. So if a TPP patient is no longer registered at an TPP practice (death or deregistration) it will return a null/blank (?) So for TPP patients who have died or deregistered, to get their last known practice we would use
practice_ID_dereg = patients.registered_practice_as_of(
date = "dereg_date", #(possibly use - 1 day? And assume this would be the same if we were extracting it on death_date?)
returning="pseudo_id"
)
And then in a data processing script we would calculate practice_id_last_registered using the above.
Just another thought - would we actually also need a practice_id_at_end = patients.registered_practice_as_of( end_date, returning = "pseudo_id")
incase anyone has changed practice between the study end date and latest build date? Or is that overkill/I'm overthinking?
What you've said there looks sensible @acagreen17. Just to be clear, you could use something like
practice_ID = dplyr::coalesce(practice_ID_latest_date, practice_ID_dereg)
to combine practices in the script.
I don't think practice_id_at_end
is useful because although people may have moved, and so the ID is out of date, we can't currently get that information from EMIS.
long term I think the plan will be to try and run analyses in both TPP and EMIS and combine results
Right - but right now, this is definitely long term, so I don't think worth investing too much time yet in getting it exactly right; this will be easier once we've refactored the study definition language and thought about from a higher-level, holistic viewpoint.
@inglesp @acagreen17
Currently in EMIS we don't have patients' practice registration history. The practice ID returned by
patients.registered_practice_as_of()
corresponds to the practice where the patient was last actively registered in EMIS as of the latest EMIS database build and ignores any value passed to thedate
argument. If the patient is actively registered on the lastest build date, the variable will return this practice. If a patient is no longer registered at an EMIS practice (death or deregistration) it will return the last practice where they were registered.It's not possible to (easily) return the equivalent in TPP. So, it would be useful to create a new variable that returns a practice ID that's consistent across backends.
It should also return the latest active registration date, i.e. the deregistration date or, if currently actively registered, the latest build date. This is different to
patients.date_deregistered_from_all_supported_practices(date="my_date")
, which only returns a deregistration date prior to"my_date"
and is blank if the patient is actively registered on"my_date"
.I don't think this needs to be any more complicated than
There's no
date
argument oron_or_before
/on_or_after
/between
arguments because the given date should be fixed at the time of the latest database build.