Closed iaindillingham closed 3 years ago
The code with the most recent consultation date.
This is the default. However, it's configurable: pass patients.with_these_clinical_events(find_first_match_in_period=True, ...)
to return the code with the earliest consultation date. (In our case, however, if two codes have identical consultation dates, then the same code will be returned when passing find_first_match_in_period
and the default.)
Thanks for investigating!
There are several options for what to return for patients.with_these_clinical_events()
and I think the behaviour you are describing occurs when returning code
or numeric_value
: you can only ever get one, because results are always one-line-per-patient, but which one you will get obeys the rules you described above.
However:
number_of_matches_in_period
you should get 2.number_of_episodes
you should always get 1 because the minimum episode length you can set is 0 days
, so events happening on the same day will count as part of the same 'episode'. But events separated by a day or more can be grouped together here if required by adjusting the episode length. As an aside, and for @LFISHER7, there isn't a straightforward way to inspect the SQL that's executed against either the TPP or the EMIS backends and, consequently, unpick a study definition (see opensafely-core/cohort-extractor#539).
For the moment:
pip install -r requirements.txt
cohortextractor
, noting that you pass the study definition as a dotted path. For example:# For the TPP backend
TEMP_TABLE_PREFIX= DATABASE_URL=mssql:// cohortextractor dump_cohort_sql --study-definition analysis.study_definition > study_definition_tpp.sql
# For the EMIS backend
EMIS_ORGANISATION_HASH=eoh TEMP_TABLE_PREFIX= DATABASE_URL=presto:// cohortextractor dump_cohort_sql --study-definition analysis.study_definition > study_definition_emis.sql
If returning
number_of_episodes
you should always get 1 because the minimum episode length you can set is0 days
, so events happening on the same day will count as part of the same 'episode'. But events separated by a day or more can be grouped together here if required by adjusting the episode length.
Have you come across any documentation for episode_defined_as
, which is where I think the episode length is defined, @HelenCEBM?
When two codes are recorded on the same day for a patient in the TPP data, are they recorded as one code or two codes in the data generated by
cohortextractor
?TL;DR. One code.
But which code? The code with the most recent consultation date. If two codes have identical consultation dates, then they are sorted by
CodedEvent_ID
in ascending order and the first code is recorded in the data generated bycohortextractor
.If two codes have identical consultation dates, can we say whether
old code
ornew code
takes precedence? I don't think we can. If we assume thatCodedEvent_ID
is an automatically-incrementing primary key (as the database schema suggests), then the first code will be that with the lowest value ofCodedEvent_ID
. If a given laboratory always returns codes in the same order, then the first code will be the same each time. However, we don't know whether that meansold code
will be the same each time ornew code
will be the same each time.Although this issue describes HBA1c codes, it applies anywhere we call
patients.with_these_clinical_events
in a study definition.