rcpch / rcpch-audit-engine

Epilepsy12 Audit Platform
https://e12.rcpch.ac.uk/
GNU Affero General Public License v3.0
5 stars 5 forks source link

KPI Data Export Specification #791

Closed pacharanero closed 5 months ago

pacharanero commented 8 months ago

DataExportTemplate.xlsx

The attached XLSX document is a template for how the E12 team would like to receive the exported KPI data. At present this is the most pressing priority in terms of data export.

@AmaniKrayemRCPCH @sairapons @nikyraja Is this format your preferred format or just the format you have been used to because of Netsolving? I ask because this is an opportunity to improve the export format to make any onward processing steps easier. We will be using code to collect the KPI data and process it into a file you can download, so we have options for how that file will look.

Some questions to get us started:

I think in order to get this nailed we should do some preliminary coding work to get a 'rough' export button and file working, and then do some 1:1 work with an analyst and a programmer to get the details nailed exactly, for example ordering of rows, ordering of columns, Sheet naming and ordering etc.

AmaniKrayemRCPCH commented 7 months ago

DataExportTemplate.xlsx

@sairapons has provided an updated version of the template.

This is the preferred format for the data, as it is the format required for updating the Power BI dashboard. It's especially important to keep the column titles the same.

I believe the 8 sheet .xlsx file is the preferred format, and the 'KPI download' button sounds good!

pacharanero commented 7 months ago

@dc2007git @pacharanero this is ready for working on next week.

pacharanero commented 7 months ago

@eatyourpeas @anchit-chandran do we have any dummy data or seeding mechanism for getting KPI'd cases in a dev environment? @AmaniKrayemRCPCH @nikyraja @sairapons do we have any sufficiently rich dummy data for this? (would need LOTS of cases with audit data completed)

pacharanero commented 7 months ago

Also worth noting that a fix to #805 needs to be implemented for the KPIs to be correct

mbarton commented 7 months ago

[from sprint planning] we would like to have this download by the end of March so it can be uploaded at the start of April

eatyourpeas commented 7 months ago

Just to add - the example .xlsx provided has only 3 sheets, not 8, of which only one is calculated data. One sheet is reference data only, representing lists of trusts and their relationships, and another sheet breaks measures into quarters. We have never discussed quarterly reporting before and currently data is only aggregated at the level of the cohort. It is not currently possible to report KPIs at fractions of a cohort, since the aggregation models only store the cohort number, though this could be introduced with some thought. It would involve introducing a new field in the model for cohort_quarter.

To do this we would need to:

  1. Know what the dates of each quarter were going to be for a given year
  2. Add a new field to the BaseKPIAggregation model for cohort_quarter (integer 1-4)
  3. Write a function to allocate existing children in the database to whichever quarter and update the model
dc2007git commented 7 months ago

@eatyourpeas there should be 8 pages - try using the arrows next to the page titles to scroll through. This is what I have once I've scrolled all the way to the left:

image
nikyraja commented 7 months ago

If it isn't possible to download the data in the provided template we can discuss this? If you have suggestions on how the csv would look, to make it easier for you, we are open to this as we can always transform it post-download.

We don't need the system to perform any calculations on quarters or months, and it remains true that CYP are grouped into cohorts only, ie. not into quarters, months etc within the cohorts.

HQIP require us to publish outputs every quarter, and soon every month. The KPI outputs are refreshed every quarter/month, but are cumulative across the cohort, ie. more CYP are included as the year progresses. When we are able to download the data, we'll be able to refresh the public outputs every month (or in theory whenever we want).

Hope this is clear, happy to describe more over a call.

dc2007git commented 7 months ago

@nikyraja can I just confirm that the team isn't interested in the KPIs not mentioned in the excel document? So for example, you don't want any data regarding water safety or first aid (KPIs 9b and 9c)?

nikyraja commented 7 months ago

Thanks @dc2007git - yes, that'd correct. We only need to have the 'top line' KPIs that are described in the template. The composite metrics such as 9b,c etc aren't required on a regular basis. This is something we could run bespoke queries on, and eventually build in.

dc2007git commented 7 months ago

Thanks @nikyraja that's fab. Can I also confirm a couple more things:

We can organise by England's KPIs, and then Wales' KPIs? Let us know 👍

nikyraja commented 7 months ago

Yes you're correct, HBT is our abbreviation for Health Board (Wales) or Trust (England).

Regarding the order of KPI, this is the way we need it to feed our power Bi dashboard. However, we can re-order post download if this order would be difficult for you to provide?

pacharanero commented 7 months ago

@nikyraja @AmaniKrayemRCPCH

KPI ordering

Regarding the order of KPI, this is the way we need it to feed our power Bi dashboard. However, we can re-order post download if this order would be difficult for you to provide?

We will aim to output exactly what you need for direct, convenient, unmodified import into the PowerBI dashboard. We don't want to add work for your team by having to re-order the sheet. What we don't understand is what the ordering of countries in the template actually is - for example, to use Danny's screenshot example, it appears to alternate England/Wales/England Wales initially, but then this ordering 'rule' is broken later on when it goes Wales/Wales...

Is it required that the England/Wales ordering is exactly as shown in the template? And if so what is the rule? Is it possible that this ordering has no specific pattern and just results from how Netsolving previously downloaded them for you? ie it is just an artefact of the internal query in Netsolving and doesn't have 'meaning' as such.

Happy to jump on a call to discuss, as it is quite difficult to clearly articulate in writing.

References tab

On another issue, we are currently working on the 'References' tab of the export sheet (all other tabs are now done apart from the quarterly export which as above you have said you don't need). In E12 we have 242 Trusts and 7 LocalHealthBoards. But in the Reference tab there are only currently 138 rows excluding the header row. From the HBT_name column these look to be a mixure of Trusts and Health Boards but we can't work out why only some of the 249 total are present.

We will put something in the References tab for completenerss, but we can then review and refine this output iteratively with you and the analysts to get it nailed.

nikyraja commented 7 months ago

Thanks @pacharanero. I've checked in and Saira has confirmed that the order of the rows does not matter, and so we are happy with anything you can provide, eg. all England and then Wales.

One thing to flag from Saira: 'The only aspect that could affect the dashboard regarding additional rows is if rows with text (strings) are added to a column with numbers or percentages. This will change the format of the column.'

Hope that's clear? Apologies for not spotting the ordering earlier

dc2007git commented 7 months ago

Hi @nikyraja , nearly done with the Excel document but there are a few nuances that I'd like to just double check are okay with you before we give you the green light. Happy to jump on a call to discuss as they're quite difficult to explain in writing!

dc2007git commented 7 months ago

EDIT - problem has been solved, the INTEGRATED_CARE_BOARDS_LOCAL_AUTHORITIES is up to date whereas TRUSTS is out of date, my mistake. It is much simpler to build the trusts list from the NHS data so this issue is resolved.

Hi @nikyraja @AmaniKrayemRCPCH hope you're both well. Just an update on the only remaining part of the KPI export - the missing fields for the trusts in the Reference sheet. The list we have created this sheet from is INTEGRATED_CARE_BOARDS_LOCAL_AUTHORITIES to be found in integrated_care_boards.py in the repository.

Essentially, each item in that list is a trust with all its relevant data, including the NHS region it belongs to, the ICB, and the ODS codes for all of these. Unfortunately, a few of these trusts no longer exist and so to that end, when we pull out the ODS code of the trust and compare it to the list of trusts inside TRUSTS (inside trust.py), nothing returns, because TRUSTS is up to date. Happy to briefly explain in a call if that isn't super clear.

We're going to update the out-of-date INTEGRATED_CARE_BOARDS_LOCAL_AUTHORITIES list, in order to fix this, but it would be much easier if we could find an up to date csv somewhere on the web with this data. Each item should have this format:

{ "NHS England Region": "North East and Yorkshire", "NHS England Region Code": "Y63", "ODS ICB Code": "QHM", "ONS ICB Boundary Code": "E54000050", "ICB Name": "NHS NORTH EAST AND NORTH CUMBRIA INTEGRATED CARE BOARD", "Sub ICB Locations (formerly CCGs)": "NHS NORTH EAST AND NORTH CUMBRIA ICB", "ODS Sub ICB Code": "00L", "Local Authority": "Gateshead Metropolitan Borough Council", "ODS LA Code": "106", "NHS Trusts": "The Newcastle Upon Tyne Hospitals NHS Foundation Trust", "ODS Trust Code": "RTD", }

which means whichever file contains this data would have 11 columns and however many rows as there are trusts in England. I have looked in the ODS and ONS databases but can't seem to find the document which meets these specifications - @eatyourpeas previously built this list with one of these documents so they should exist somewhere, hopefully up to date.

Do you have any idea where a document like this could be found, or do you have access to it? It would make things easier for us, because otherwise we'll be needing to make a lot of changes manually which is less error-safe!

pacharanero commented 7 months ago

Thanks @dc2007git - perhaps it is time to have a sit-down review of where we are with KPI Export, which is all but ready to roll but for small issues like this. Would be good to check in with the analysts and make sure what we are currently outputting in Excel form is what they need. And we can pool our resources to think through solutions to this issue of keeping organisation data up to date.

AmaniKrayemRCPCH commented 7 months ago

Thanks @dc2007git , I believe the latest files are here - https://digital.nhs.uk/services/organisation-data-service/export-data-files/csv-downloads/other-nhs-organisations

I am aware of at least one Trust merger not actioned on the platform yet - Southport and Ormskirk NHS Trust and St Helens and Knowsley Teaching Hospitals NHS Trust merged to form Mersey and West Lancashire Teaching Hospitals NHS Trust in 2023 - https://www.england.nhs.uk/publication/mersey-and-west-lancashire-teaching-hospitals-nhs-trust/

Looking at the version of the export you shared with us last week, a lot of the Trusts with missing information aren't normally included in the audit.

I can go through the export and provide a list of Trusts that we don't actually have in the audit and maybe we can remove them from the platform (assuming no data has been entered under them), if everyone's happy with that? If i spot any that should be in the audit but are missing information, I can provide that too.

Edit to add that Somerset NHS Foundation Trust has acquired Yeovil District Hospital NHS Foundation Trust.

eatyourpeas commented 7 months ago

I have actioned all the Organisation/Trust requests in all the 3 repositories to date. I don't think I knew about Southport and Ormskirk - I am sorry. Is there an issue for this? If not, could you please open one and I will action separately. The same for Somerset if so.