uclhal / cheryl-phd

Cheryl Achary PhD 2021-2024
0 stars 0 forks source link

review Cheryl icnarc code #33

Closed tagimoucia-cheryl closed 2 years ago

tagimoucia-cheryl commented 2 years ago

Repo here

docsteveharris commented 2 years ago

General comments

Modularise

you have a lot of separate SQL scripts to extract different data items I'd encourage you to build a handful of generic scripts (e.g. 1 for labs, 1 for observations) and then another (e.g. ICU admissions) to define ICU patients and the times needed for their first 24h. You can then store the ICU admissions as a table or materialised view, and then use this in a join for the generic obs/lab values scripts. You will have less to maintain this way. This is a similar theme to the decisions around what you do in SQL and what you do in R. General principle is that if any bit of code is so long that you'd feel guilty asking someone to review it then it should be shorter and linked to a test. That way you have more confidence in your own work, and the reviewer can offer more specific advice.

Perhaps try this as an exercise Write an R script to read from EMAP using a SQL query The script will need a location and an time period e.g. T06 from April 2021 to date P03CV from March 2020 to June 2020 etc

Return all patients who have been in a specific location Then concatenate Define the first 24h start time in that location Define the end of that first 24h Write back to a table in the ICU audit schema

Once done then all your measurement/lab queries can use this table which is separately quality controlled in their scripts

code quality

I can review the code but you might also want to think about a testing framework. For example, create a single 'dummy' patient by hand and then write some tests to make sure that your code behaves as expected. See https://github.com/ropensci/assertr