opensafely / post-covid-renal

MIT License
0 stars 0 forks source link

Preprocess type fix #34

Closed eyles-ec closed 5 months ago

eyles-ec commented 6 months ago

Fixing the type mismatch that caused preprocess to fail and making the code a bit more efficient, linking in death and deregistration dates earlier on in preprocess, and then making sure they're included in the final extract.

The main changes are to preprocess.R

It also shows a change to study_definition_prelim.py , but I was checking the changes from the dereg fix, and added a blank line in somewhere or other.

This PR also includes the codelist changes that came with the latest opensafely update.

venexia commented 6 months ago

Hi @eyles-ec - I am still getting a type error when I run this locally:

2024-04-12T15:24:47.704536423Z ! Can't join on `x$patient_id` x `y$patient_id` because of incompatible types.
2024-04-12T15:24:47.704538048Z ℹ `x$patient_id` is of type <double>>.
2024-04-12T15:24:47.704539923Z ℹ `y$patient_id` is of type <character>>

Do you also see this error?

eyles-ec commented 6 months ago

Hi @eyles-ec - I am still getting a type error when I run this locally:

2024-04-12T15:24:47.704536423Z ! Can't join on `x$patient_id` x `y$patient_id` because of incompatible types.
2024-04-12T15:24:47.704538048Z ℹ `x$patient_id` is of type <double>>.
2024-04-12T15:24:47.704539923Z ℹ `y$patient_id` is of type <character>>

Do you also see this error?

@venexia This was the error I fixed in preprocess by forcing patient_id to be numeric in both instances. On line 52 for 'df' (the main data) and line 60 for 'prelim_data' (death/dereg date data).

I don't get this error locally, I have tested it again. However, I did use the '-f' line you suggested previously so it ran all previous actions before preprocess.

venexia commented 6 months ago

I also ran it with -f so not sure what is going on there! We need to force patient_id to be character rather than numeric - we don't know its format in the real data and, even if it is numeric, it risks being rounded when treated as numeric. Otherwise, the changes seem sensible.