opensafely / tpp-sql-notebook

2 stars 0 forks source link

creating binary or categorical values #4

Closed CarolineMorton closed 4 years ago

CarolineMorton commented 4 years ago

We can pull out patient IDs and event details for codes for example, smoking. I have done some coding that allows you to search for some key terms in the QoF csv and then pull out any read codes that match this. We can then use this list to search for patients with this code.

My question is about at which point do we add the binary or categorical values. We could do a number of things: 1) do in stata - researcher looks at the csv file for smoking and assigns each one 0 - never smoker, 1 - ex-smoker, 2 - current smoker 2) manually categorise the original csv before we run the sql query - means we would have to do this rather than lshtm

alexwalkerepi commented 4 years ago

I would prefer to keep data management in SQL/python wherever possible, and output just a single column for each variable, rather than a whole csv.

We'll have to discuss definitions of each of the variables closely, as well as the categories in each variable.

As we develop the code, we can add in flexibility to output the variables in different formats where needed.