Automatically generate inclusion criteria flowcharts

sebbacon commented 4 years ago

Example from the Risk Factors paper:

Assuming the study population is defined in a satisfying expression, then I think we want to parse that expression such that each top-level "AND" and "AND NOT" becomes a boolean column in a per-patient dataframe. It would then be quite easy to calculate the stepwise filtering of numbers at each stage in the diagram.

Generating the actual diagram automatically is probably a step too far as the correct labels might vary independently of the variable names, especially for negated variables. Also styles etc will vary depending on the paper it's published in.

So a minimal output would just be a data file of some kind with labels and numbers at each stage.

However it might be useful & fun to make a simple flowchart that can easily be edited. For example, we could use graphviz for this (useful examples 1, 2) and add draggable connectors for use in Inkscape

Here's how people currently do it (source:

Many studies will require a flowchart to show inclusion/exclusion of patients in the study. Eventually the numbers of patients excluded/included will be summarised automatically following cohort extract, but for now, a slightly manual approach is required:

Make a copy of the study definition (called study_definition_flow_chart.py). The population=patients.satisfying() function should be replaced with population=patients.all(). Then all variables except for those that appeared in the population definition logic should be removed (this will mean that it runs much faster than the main study definition). An example of such a study definition is here.

Then write a script that reads the input_flow_chart.csv and then sequentially drops each of the variables and counts the remaining population, in whatever order you'd like to report them. Here's an example written in Stata

sebbacon commented 3 years ago

There's a nice React library we could potentially make use of: https://reactflow.dev/examples/edges/

remlapmot commented 3 years ago

There are also some R packages which you might want to look at

sebbacon commented 2 years ago

These days, mermaid seems like a pretty great option as an intermediate, tweakable format.

opensafely-core / ehrql

Automatically generate inclusion criteria flowcharts #2130