This is the code and configuration for post-covid-autoimmune.
You can run this project via Gitpod in a web browser by clicking on this badge: .
The content has ONLY been made public to support the OpenSAFELY open science and transparency principles and to support the sharing of re-usable code for other subsequent users. No clinical, policy or safety conclusions must be drawn from the contents of this repository.
If you are interested in how we defined our code lists, look in the codelists
folder.
Analyses scripts are in the analysis
directory:
python
. Extracted data is then combined to create our final cohorts, in the preprocess data script.The lib/
directory contains a list of active analyses.
The project.yaml
defines run-order and dependencies for all the analysis scripts. This file should not be edited directly. To make changes to the yaml, edit and run the create_project.R
script which generates all the actions.
This manuscript is currently being drafted.
The project.yaml
defines project actions, run-order and dependencies for all analysis scripts. This file should not be edited directly. To make changes to the yaml, edit and run the create_project.R
script instead. Project actions are then run securely using OpenSAFELY Jobs. Any published outputs from this project can be found at this link as well.
Below is a description of each action in the project.yaml
. Arguments are denoted by {arg} in the action name.
vax_eligibility_inputs
metadates.R
which creates metadata for aspects of the study design which are required for the generate_study_population
actions.generate_study_population_{cohort}
generate_study_population
scripts that are run to create the three study populations: prevaccinated (prevax), vaccinated (vax) and electivively unvaccinated (unvax). These are study_definition_prevax.py
, study_definition_vax.py
, study_definition_unvax.py
and study_definition_prelim.py
.common_variables.py
.preprocess data - {cohort}
preprocess_data.R
to apply dataframe tidying to input_{cohort}.rds
(generated by generate_study_population_{cohort}
.stage1_data_cleaning_{cohort}
Stage1_data_cleaning.R
.consort_{cohort}_midpoint6
Stage1_data_cleaning.R
.table1_{cohort}
table1.R
which calculates descriptive statistics for pre- and post-exposure events for all outcomes and subgroups.extendedtable1_{cohort}
extendedtable1.R
which calculates descriptive statistics for pre- and post-exposure events for all outcomes and subgroups.table2_{cohort}
table2.R
which calculates pre- and post-exposure event counts and person days of follow-up for all outcomes and subgroups.venn - {cohort}
venn.R
.make_model_input-{name}
make_model_input.R
which prepares datasets for all the outcomes and subgroups.describe_model_input-{name}
describe_file.R
which calculates counts, and descriptive statistics for the outcomes and covariates used in the cox models.cox_ipw-{name}
cox-ipw
, a R reusable action for the OpenSAFELY framework.README
file.make_model_output
make_model_output.R
which combines all the R results in one formatted .csv file.In OpenSAFELY a study definition is a formal specification of the data that you want to extract from the OpenSAFELY database. This includes:
Further details on creating the study population can be found in the OpenSAFELY documentation
.
The contents of this repository MUST NOT be considered an accurate or valid representation of the study or its purpose. This repository may reflect an incomplete or incorrect analysis with no further ongoing work. The content has ONLY been made public to support the OpenSAFELY open science and transparency principles and to support the sharing of re-usable code for other subsequent users. No clinical, policy or safety conclusions must be drawn from the contents of this repository.
Outputs follow OpenSAFELY naming conventions related to suppression rules by adding the suffix "_midpoint6". The suffix "_midpoint6_derived" means that the value(s) are derived from the midpoint6 values. Detailed information regarding naming conventions can be found here.
Variable | Description |
---|---|
Description | Criterion applied to cohort |
N_midpoint6 | Number of people in the cohort after criterion applied time |
removed | Number of people removed due to criterion being applied |
Variable | Description |
---|---|
Characteristic | Patient characteristic under consideration |
Subcharacteristic | Patient sub characteristic under consideration |
N (%) derived | Number of people with characteristic, alongside % of total |
COVID-19 diagnoses midpoint6 | Number of people with characteristic and COVID-19 |
Variable | Description |
---|---|
name | Unique identifier for analysis |
cohort | Cohort used for the analysis |
exposure | Exposure used for the analysis |
outcome | Outcome used for the analysis |
analysis | String to identify whether this is the ‘main’ analysis or a subgroup |
unexposed_person_days | Number of person days before or without exposure in the analysis |
unexposed_events_midpoint6 | Number of unexposed people with the outcome in the analysis |
exposed_person_days | Number of person days after exposure in the analysis |
exposed_events_midpoint6 | Number of exposed people with the outcome in the analysis |
total_person_days | Number of person days in the analysis |
total_events_midpoint6_derived | Number of people with the outcome in the analysis |
day0_events_midpoint6 | Number of people with the exposure and outcome on the same day |
total_exposed_midpoint6 | Number of people with the exposure in the analysis |
sample_size_midpoint6 | Number of people in the analysis |
Variable | Description |
---|---|
outcome | Outcome under consideration |
only_snomed_midpoint6 | Outcome identified in primary care only |
only_hes_midpoint6 | Outcome identified in secondary care only |
only_death_midpoint6 | Outcome identified in death registry only |
snomed_hes_midpoint6 | Outcome identified in primary and secondary care |
snomed_death_midpoint6 | Outcome identified in primary care and death registry |
hes_death_midpoint6 | Outcome identified in secondary care and death registry |
snomed_hes_death_midpoint6 | Outcome identified in primary care, secondary care, and death registry |
total_snomed_midpoint6 | Total outcomes identified in primary care |
total_hes_midpoint6 | Total outcomes identified in secondary care |
total_death_midpoint6 | Total outcomes identified in death registry |
total_midpoint6_derived | Total outcomes identified |
cohort | Cohort under consideration |
Variable | Description |
---|---|
name | Unique identifier for analysis |
cohort | Cohort used for the analysis |
outcome | Outcome used for the analysis |
analysis | String to identify whether this is the ‘main’ analysis or a subgroup |
error | Captured error message if analysis did not run |
model | String to identify whether the model adjustment |
term | String to identify the term in the analysis |
lnhr | Log hazard ratio for the analysis |
se_lnhr | Standard error for the log hazard ratio for the analysis |
hr | Hazard ratio for the analysis |
conf_low | Lower confidence limit for the analysis |
conf_high | Higher confidence limit for the analysis |
N_total_midpoint6 | Total number of people in the analysis |
N_exposed_midpoint6 | Total number of people with the exposure in the analysis |
N_events_midpoint6 | Total number of people with the outcome following exposure in the analysis |
person_time_total | Total person time included in the analysis |
outcome_time_median | Median time to outcome following exposure |
strata_warning | String to identify strata variables that may cause model faults |
surv_formula | Survival formula for the analysis |
Variable | Description |
---|---|
aer_sex | Sex subgroup under consideration |
aer_age | Age subgroup under consideration |
analysis | String to identify whether this is the ‘main’ analysis or a subgroup |
cohort | Cohort used for the analysis |
outcome | Outcome used for the analysis |
unexposed_person_days | Unexposed person days in the age/sex grouping |
unexposed_events_midpoint6 | Number of events in unexposed people in the age/sex grouping |
total_exposed_midpoint6 | Total number of people with the exposure in the age/sex grouping |
sample_size_midpoint6 | Total number of people in the age/sex grouping |
The OpenSAFELY framework is a Trusted Research Environment (TRE) for electronic health records research in the NHS, with a focus on public accountability and research quality. Read more at OpenSAFELY.org.
As standard, research projects have a MIT license.