createUKBphenome

Basic concepts

ICD code / PheWAS code mapping from phewascatalog (https://phewascatalog.org/phecodes and https://phewascatalog.org/phecodes_icd10)
Collection of information about PheWAS codes and their inclusion / exclusion filters
Collection and harmonization of ICD codes from UKB
Extraction of all ICD codes from the available fields in your UKB baskets
Generatation of a phenome: case control study for each phecode

Required R libraries

data.table
tidyr
parallel
intervals
htmltab
bitops

Step 1: Describe your data

Add the absolute paths (e.g. /driveA/UKB/ukb####.tab) of your TAB-delimited UKB baskets to a single text file ./data/baskets.txt Add the latest file with withdrawn samples 'w#####_########.csv' to './data/' folder

Step 2: Create Phenome

cd createUKBphenome
Rscript ./scripts/function.createUKBphenome.r

Output

Full ICD / PheWAS code tables with descriptions (what's the underlying ICD code for each phecode)
UKB phenome with exclusion criteria applied to controls
UKB phenome without applying exclusion criteria to controls
Overview of all phecodes, their categories and general descriptions
Output of all ICD codes that were NOT mapped to phecodes (incl. sample sizes)
Output of all individuals that had sex-specific diagnose codes that did not match their sex

Notes:

This script requires a ton of memory (~20-30 GB), because it reads and collects a lot of data into memory.
This script requires ICD data of the UK Biobank (ideally the most comprehensive list), Genetic Sex and Sex
Only samples with Genetic Sex equals Sex are kept, because it's unclear why it should be different (potential sources for mismatch: gender identity, bone marrow transplant, sample swap)

umich-cphds / createUKBphenome

readme