Closed marcus-clements closed 3 years ago
Thanks for the example @marcus-clements, which shows how to compute the pairwise correlation of columns within a Pandas data frame. There are many Python libraries that we might use to model relationships within the data; our python-docker
image comes with scipy and scikit-learn, for example. However, there isn't a standard approach: Pandas has several data frame methods that model the data; scikit-learn provides both functional and object-oriented approaches, using pipelines. Consequently, adding a general example to the documentation would be hard. Hopefully the documentation on scripted actions and the project pipeline are sufficient.
Please do consider opening a PR on the OpenSAFELY documentation, though, if you feel a specific use case is missing.
I had a look around the OpenSAFELY repos but I couldn't find an example of a Python model so I had a go at creating a project from this template which does execute a (toy) Python model
https://github.com/marcus-clements/opensafely-project-python-example
Seems to work fine on the dummy data generated by the cohort extractor.
I'd be happy to fork and make a PR if it's useful for others.