opensafely / research-template

The template for new research projects that use the OpenSAFELY framework.
MIT License
16 stars 13 forks source link

Python model example #23

Closed marcus-clements closed 3 years ago

marcus-clements commented 3 years ago

I had a look around the OpenSAFELY repos but I couldn't find an example of a Python model so I had a go at creating a project from this template which does execute a (toy) Python model

https://github.com/marcus-clements/opensafely-project-python-example

Seems to work fine on the dummy data generated by the cohort extractor.

I'd be happy to fork and make a PR if it's useful for others.

iaindillingham commented 3 years ago

Thanks for the example @marcus-clements, which shows how to compute the pairwise correlation of columns within a Pandas data frame. There are many Python libraries that we might use to model relationships within the data; our python-docker image comes with scipy and scikit-learn, for example. However, there isn't a standard approach: Pandas has several data frame methods that model the data; scikit-learn provides both functional and object-oriented approaches, using pipelines. Consequently, adding a general example to the documentation would be hard. Hopefully the documentation on scripted actions and the project pipeline are sufficient.

Please do consider opening a PR on the OpenSAFELY documentation, though, if you feel a specific use case is missing.