ropensci-archive / cleanEHR

:warning: ARCHIVED :warning: Essential tools and utility functions to facilitate the data processing pipeline, data cleaning and data analysing of clinical data from CC-HIC
GNU General Public License v3.0
54 stars 23 forks source link

[report] comments from Nicola #118

Closed sinanshi closed 2 years ago

sinanshi commented 7 years ago

Hi Sinan

Ways in which I think the Data Quality Report document could be improved!

  1. Print the name of the Trust that the document is for in big letters on the front page

  2. Print somewhere the electronic filename of the document so that it can be located from the paper, and possibly date produces (I know there is a table of information, but these need to be added)

  3. Smaller margins and using the whole sheet of paper would make it more readable

  4. Font sizes also very small on axes of diagrams so can’t read

  5. Site reference – a number of issues:

a. Why is R42 listed as it doesn’t exist

b. The data summary states “data from x sites” where x=0 for Oxford so doesn’t make sense

c. The data summary states “data from x sites” where x=10 for ALL – but then in list below there are 12 possible sites, plus 2 “blank” plus 2 “Unknown”

d. In the report for ALL data, on page 3 the diagram for “Site”, 11 sites (not 10 as in the summary) are shown as includes “blank” for Oxford but refers to it as NA-NA

e. In the file list below, site is listed as NA for Oxford. Inconsistent use of blank and NA

  1. Original XML and parse information – would be useful to have sub-headings of each trust above the list of files, (and maybe in brackets after the trust name, the number of XML files included), then the list of files for that Trust below

  2. The “Duration of XML files” is a great idea, but writing too small to read and all overwritten so very very hard to understand – needs to be a much bigger diagram or represented in a different way

  3. “Site” below the above diagram is good but writing for scale on axes too small to read – diagram and writing on x-axis needs to be larger

  4. Ethnicity – what is the difference between “NA” and “not stated”

  5. Personal preference, but I would prefer the categories always in the same order so can compare reports rather than in ascending order? What do others think?

  6. Don’t understand Category “Alive – not discharged” when the header is just about Discharged patients? Doesn’t make sense to me, or worded wrongly.

  7. As above for Discharge (0097) and “discharged from your hospital (0095)”

  8. Demographic Data Completeness:

a. What does red highlight mean? (completeness less than accept completeness?) Need key

b. What does green highlight mean? (completeness greater than accept completeness?) Need key

c. What is Rejected Sites column for? E.g. NHS number is 97.23% complete but in Rejected sites have very high percentages around 97% so this doesn’t make sense? If 97% complete then 3% rejected and not 97% or have I totally misunderstood? Only included for a few, both “red” and “green”.

d. What does column “Accept completeness %” mean – not completed for all items

  1. Sample period of time-wise data – despite explanation do not understand what this is showing? E.g. I think all measurements are “automatically” taken hourly, so “Sample period” should equal 1 if all measurements available? For Heart Rate if Sample period = 3, does this mean that 2/3 of the “hourly” measurements are missing?

  2. Data distribution graphs:

a. When 11 shown side-by-side should all have same axes so can compare like with like

b. Would prefer if bigger – use the paper more effectively

c. Page 12 – APACHE II score missing for D20 and X90 – this is shown as 0.00 in “rejected sites” list for that attribute but still don’t quite understand that column. “Rejected”? Is that the correct word?

  1. For Oxford (the only other one I have looked at in detail and compared to “ALL” so may be issues with other trusts that I haven’t noticed):

a. Summary says “0” Sites but is wrong as then shows site diagrams

b. Latest discharge date is 2015-05-04 13:14:00 – are you sure this is correct as 18 months ago and data submitted in May 2016?

c. Should the latest discharge time be highlighted if > 6months ago showing another submission is due?

d. Sample period of time-wise data is EXACTLY the same as for ALL, so don’t believe it?

e. Demographic data distribution – first 2 are individual for Oxford, but then remaining distributions show all 11 sites?

Also – with clinical input – is there anything else that should be highlighted as obviously wrong so that when clinicians look at these reports they can interpret them easily? I suspect that these reports have not been looked at yet with a clinical eye (as Niall hasn’t seen them). Need a process to ensure the clinical team get a copy as there may be other things (apart from the latest discharge date above for Oxford) that are not correct, and there does seem to be some duplication from “ALL” and not just filtering on the individual trust for some diagrams? These reports need sanity checking as easy to be blown away with all the figures and data, but it may not all be what it says it is, as I have spotted some odd stuff in just a few minutes.

(David and Luis – the reports are on my desk if you want to look at them and see if you have any more comments)

Regards, Nicola

Nicola Cooper ICT Process Analyst, UCLH 020 3447 5029 | x75029 NIHR Health Informatics Collaborative (NIHR HIC)

UCL/UCLH Biomedical Research Centre, 2nd Floor UCLH Farr Institute, 222 Euston Road, London NW1 2DA

We are committed to delivering top-quality patient care, excellent education and world class research

safety kindness teamwork improving

UCLH website | Media | Become a member | Support our charities | Contact UCLH Find us on Facebook | Follow us on Twitter | Subscribe on YouTube