Assuming the study population is defined in a satisfying expression, then I think we want to parse that expression such that each top-level "AND" and "AND NOT" becomes a boolean column in a per-patient dataframe. It would then be quite easy to calculate the stepwise filtering of numbers at each stage in the diagram.
Generating the actual diagram automatically is probably a step too far as the correct labels might vary independently of the variable names, especially for negated variables. Also styles etc will vary depending on the paper it's published in.
So a minimal output would just be a data file of some kind with labels and numbers at each stage.
However it might be useful & fun to make a simple flowchart that can easily be edited. For example, we could use graphviz for this (useful examples 1, 2) and add draggable connectors for use in Inkscape
Many studies will require a flowchart to show inclusion/exclusion of patients in the study. Eventually the numbers of patients excluded/included will be summarised automatically following cohort extract, but for now, a slightly manual approach is required:
Make a copy of the study definition (called study_definition_flow_chart.py). The population=patients.satisfying() function should be replaced with population=patients.all(). Then all variables except for those that appeared in the population definition logic should be removed (this will mean that it runs much faster than the main study definition). An example of such a study definition is here.
Then write a script that reads the input_flow_chart.csv and then sequentially drops each of the variables and counts the remaining population, in whatever order you'd like to report them. Here's an example written in Stata
Example from the Risk Factors paper:
Assuming the study population is defined in a
satisfying
expression, then I think we want to parse that expression such that each top-level "AND" and "AND NOT" becomes a boolean column in a per-patient dataframe. It would then be quite easy to calculate the stepwise filtering of numbers at each stage in the diagram.Generating the actual diagram automatically is probably a step too far as the correct labels might vary independently of the variable names, especially for negated variables. Also styles etc will vary depending on the paper it's published in.
So a minimal output would just be a data file of some kind with labels and numbers at each stage.
However it might be useful & fun to make a simple flowchart that can easily be edited. For example, we could use graphviz for this (useful examples 1, 2) and add draggable connectors for use in Inkscape
Here's how people currently do it (source: