opensafely-core / cohort-extractor

Cohort extractor tool which can generate dummy data, or real data against OpenSAFELY-compliant research databases
Other
38 stars 13 forks source link

New data: waiting list (WIP) #783

Open HelenCEBM opened 2 years ago

HelenCEBM commented 2 years ago

Background: Document

Schema: Schema

Recording and reporting guidelines: Link

Data flow/structure & complications

There are two main datasets:

  1. "Open pathways" dataset - patients on waiting list (also includes patients who are no longer waiting e.g. procedure no longer required)
  2. "Clockstops" dataset - completed wait on waiting list (had procedure)

At a given time point a patient may be:

Data issues to consider

Issue "Open pathways" dataset "Clockstops" dataset
Different pathways/procedures for same patient Patients may be waiting for more than one procedure at any one time. We will need each of these to be available for analysis Patients appear for each completed pathway (and may have same procedure more than once e.g. cataract). We will need each of these to be available for analysis
Duplicates over time Patients appear each week they are on the waiting list. We probably generally only need the latest for each patient-pathway combination within the period of interest (however, useful to look back at how their pathway has changed) (Patients should appear only once per completed pathway)
Change over time Information may change while patients are on waiting list, e.g. change of planned procedure, cancellation (procedure no longer required) etc N/A

Potential research/monitoring questions:

*Note: in cohortextractor we can only include one wait per patient in each period (but usually we will filter to a certain specialty, set of procedures or urgency, so duplicates should be minimised)

Data elements required

Returning options

Note: may need to minimise options due to duplicate records per patient.

Filters

Date filters/calculations

Questions

rebkwok commented 2 years ago

@HelenCEBM

type of pathway ORTT - Current RTT Non-admitted (patients whose RTT pathways ended for reasons other than admission for treatment) IRTT - Current RTT Admitted ONON - Not current RTT Non-admitted (patients whose RTT pathways ended for reasons other than admission for treatment) INON - Not current RTT Admitted

What is the difference between ORTT and ONON (the labels are different, but the descriptions in parentheses are the same)?

HelenCEBM commented 2 years ago

What is the difference between ORTT and ONON (the labels are different, but the descriptions in parentheses are the same)?

AIUI, non-admitted means the pathway ended for some reason other than having the procedure (e.g. no longer needs the procedure) ..while the RTT/non-RTT distinction is related to what kind of referral it is - RTT pathways are those that meet certain requirements (e.g. to a consultant-led service or a triage service) while non-RTT is all others (e.g. referral to a non-consultant led service).

HelenCEBM commented 2 years ago

Referral vs pathway IDs:

Available DATES (Open Pathways):

Some date filter options (Open pathways):

rebkwok commented 2 years ago

If a patient is present in the open pathways dataset, do we assume they are on a waiting list (RTT or otherwise)? The 4 statuses refer to RTT pathways ending, but I think that means they're still on a waiting list, just not an RTT one?

With the exception (maybe) of entries that have a cancelled date?

rebkwok commented 2 years ago

From the guidance, I think records with a cancellation date are still on the waiting list

4.4.2 Cancelled and rearranged appointments A cancelled or rearranged appointment, either patient-initiated or provider-initiated will not in itself stop an RTT clock.

rebkwok commented 2 years ago

The simplest implementation for cohort-extractor would be to look at a snapshot of patients who are on a waiting list at a specific date. We would need to take a single reference date (e.g. 28 Feb 2022), and find the records with a week_ending_date for the next Sunday (unless the reference date is a Sunday itself). Assuming we can consider REFERRAL_REQUEST_RECEIVED_DATE as the first date a patient joined a waiting list, waiting list time is the difference between this date and the supplied reference date.

For patients who are on more than one waiting list (with any other matching filters applied), we'll need to select one record; we can use select_first_match_in_period to select the longest wait time (by earliest REFERRAL_REQUEST_RECEIVED_DATE)

Time on waiting list - this could be a filter or a return value (or both, I guess)

HelenCEBM commented 2 years ago

Some notes from the reporting guidance:

Patient pathway A patient pathway is usually considered to be their journey from first contact with the NHS for an individual condition, through referral, diagnosis and treatment for that condition. For chronic or recurrent conditions, a patient pathway will continue beyond the point at which first definitive treatment starts, as it will include further treatment for the same condition. A person may therefore have multiple RTT periods (see Referral to treatment period) along one patient pathway.

Referral to treatment period An RTT period is the time between a person’s referral to a consultant-led service, which initiates a clock start, and the point at which the clock stops for any of the reasons set out in the RTT national clock rules, for example the start of first definitive treatment or a decision that treatment is not appropriate.

A patient pathway identifier (PPID) should be assigned to a pathway arising from a referral for a particular condition where this is a referral within the scope of the RTT measure. At the beginning of the patient journey the first organisation receiving the referral should generate a Patient Pathway Identifier (which may be based on the Unique Booking Reference Number (UBRN)). This along with the Organisation Code of that organisation (the Organisation Code of the PPID Issuer) should be used consistently to record the unique identifier for the pathway. The clock start date should also be recorded. Where the patient’s RTT pathway or individual RTT periods within that pathway are delivered by more than one organisation, it is essential that the same PPID and Organisation Code of PPID Issuer are applied, in other words, they do not change even where the responsibility for patient care transfers to a different organisation.

note that where the initial referral was received via the NHS e-Referral Service and the UBRN is used as the basis of the PPID, then the organisation code of PPID Issuer is X09;

^ we should check whether the receiving org ID is useful as an org identifier or whether only the current org ID should be used for trust-level variation

HelenCEBM commented 2 years ago

If a patient is present in the open pathways dataset, do we assume they are on a waiting list (RTT or otherwise)? The 4 statuses refer to RTT pathways ending, but I think that means they're still on a waiting list, just not an RTT one?

With the exception (maybe) of entries that have a cancelled date?

Yes I think this is correct!

rebkwok commented 2 years ago

From Chris in this thread:

Just so you know, things like the referral identifier and pathway identifier are going to be pretty useless on their own. There are loads of NULLs and then combinations of all flavour of abbreviations of “not applicable” and “99999999”. There’s also a lot of classic excel issues - e.g. where the hospital team have clearly used excel as an interim to data upload and it’s converted long ids into XXXXE+7 type notation and so lost the identifier integrity.

Once we obfuscate these, you won't have any idea which ones are legitimate and which ones aren't.

This should mean that the simpler implementation we've discussed (looking at waiting list records at a particular snapshor date) is fine, but the more complex ideas (e.g. looking for patients who dropped off the waiting list during a period) will be difficult. We'll probably be able to identify pathways/referrals by start date and patient ID but we won't be able to rely on referral ID to differentiate

robinyjpark commented 2 years ago

@iaindillingham – as part of the data validation pipeline, we envisioned that there would be two steps to implementing new data in OpenSAFELY.

Firstly, we would want to produce a schema and report data types and completeness. As examples, please see the ISARIC notebook and notes on the therapeutics data.

Secondly, further checks should be done to determine the meaning of each field, whether any fields contain sensitive information that should not be used, and discover any other unexpected features or limitations of the data. This can be done using the raw data plausibility checking functions that Helen wrote (documentation here, repo here, helpful Slack thread here).

iaindillingham commented 2 years ago

According to Chris in this thread, the waiting list data was added over the weekend of 23/24 July.

rebkwok commented 2 years ago

https://github.com/opensafely/data-exploration-notebooks/blob/main/waiting_lists/waiting_list_data_exploration.ipynb :arrow_up: Some first explorations of the 3 waiting list tables, using a modified version of Helen's notebooks.

My first concern is the missing values for the Week_Ending_Date; this is supposed to be (according to the schema spreadsheet) "the Sunday of the week that the pathway relates to". I expected it to always be present, but there are >10 million missing values. There are also a lot of missing waiting list type values, and a lot more waiting list types that I'd expected as well (schema spreadsheet lists ORTT, IRTT, ONON, INON), but there are lots more than that - looking at the distinct values it seems like these values aren't constrained - we've got values like "unkn", "Not-" as well as nulls.

brianmackenna commented 2 years ago

some scratch notes from meeting the WL MDS team https://docs.google.com/document/d/1Y4keZ51WDs-DE2PyLL2XOs5ju9opphBhpYr7aczFtYc/edit