createUKBphenome
Basic concepts
- ICD code / PheWAS code mapping from phewascatalog (https://phewascatalog.org/phecodes and https://phewascatalog.org/phecodes_icd10)
- Collection of information about PheWAS codes and their inclusion / exclusion filters
- Collection and harmonization of ICD codes from UKB
- Extraction of all ICD codes from the available fields in your UKB baskets
- Generatation of a phenome: case control study for each phecode
Required R libraries
- data.table
- tidyr
- parallel
- intervals
- htmltab
- bitops
Step 1: Describe your data
Add the absolute paths (e.g. /driveA/UKB/ukb####.tab
) of your TAB-delimited UKB baskets to a single text file ./data/baskets.txt
Add the latest file with withdrawn samples 'w#####_########.csv' to './data/' folder
Step 2: Create Phenome
cd createUKBphenome
Rscript ./scripts/function.createUKBphenome.r
Output
- Full ICD / PheWAS code tables with descriptions (what's the underlying ICD code for each phecode)
- UKB phenome with exclusion criteria applied to controls
- UKB phenome without applying exclusion criteria to controls
- Overview of all phecodes, their categories and general descriptions
- Output of all ICD codes that were NOT mapped to phecodes (incl. sample sizes)
- Output of all individuals that had sex-specific diagnose codes that did not match their sex
Notes:
- This script requires a ton of memory (~20-30 GB), because it reads and collects a lot of data into memory.
- This script requires ICD data of the UK Biobank (ideally the most comprehensive list),
Genetic Sex
and Sex
- Only samples with
Genetic Sex
equals Sex
are kept, because it's unclear why it should be different (potential sources for mismatch: gender identity, bone marrow transplant, sample swap)