rmgpanw / ukbwranglr

R package for UK Biobank data wrangling.
https://rmgpanw.github.io/ukbwranglr/
Other
14 stars 1 forks source link
package r uk-biobank

ukbwranglr

R build
status Codecov test
coverage pkgdown Launch RStudio
Cloud DOI Project Status: WIP – Initial development is in progress, but there
has not yet been a stable, usable release suitable for the
public.

Overview

The goal of ukbwranglr is to facilitate analysing UK Biobank data, including:

  1. Reading a selection of UK Biobank variables into R.
  2. Summarising repeated continuous variable measurements.[^1]
  3. Extracting phenotypic outcomes of interest from clinical events data.[^2]

Installation

You can install the development version of ukbwranglr with:

# install.packages("devtools")
devtools::install_github("rmgpanw/ukbwranglr")

Basic workflow

The basic workflow is as follows:

  1. Create a data dictionary for your main UK Biobank dataset with make_data_dict().
  2. Read selected variables into R with read_ukb().
  3. Summarise continuous variables with summarise_numerical_variables().
  4. Tidy clinical events data with tidy_clinical_events() or make_clinical_events_db(), and extract outcomes of interest with extract_phenotypes().
  5. Analyse.

Please see vignette('ukbwranglr') for further details.

[^1]: For example, calculating a mean/minimum/maximum body mass index (BMI) from repeated BMI measurements.

[^2]: For example, identifying participants with a diagnosis of hypertension from linked primary and secondary health care records.