rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
187 stars 55 forks source link

Testing recessive model with gene-based test using Regenie #408

Closed irisjansen closed 7 months ago

irisjansen commented 1 year ago

Hi!

Thank you for making this great tool available to the genetics field!

I am currently working with the UKB data and am performing an analysis using various combinations of masks and AF to test for a recessive effect of aggregated rare variants in a specific gene to 12 phenotypes (6 continuous, 6 binary), see example piece of code below. For the binary once I am using the comment --firth and the results seem fine. For the continuous results however I seem to generate very strange results. The phenotypes are age of onset, meaning that UKB participants that are not diagnosed with the disease of interest, are excluded from the analysis as they have NA value.

Some results are extremely significant (p=5.8e-231), but when trying to determine which samples are driving this result, I am observing that the homozygous carriers all have 'NA' value for the phenotype. So to my understanding they should not have been included in the analysis, and the statistical test with recessive model should not have been tested.

Searching through the Documentation and Overview tabs of the Regenie website, I am unable to find the details on what exact statistical tests is being used for a recessive model. Could you tell me what is exactly done with the recessive model? This potentially could help me understand how it could be possible that a result is being generated, while I would expect no result as it seems the homozygous carriers do not have the disease.

Thank you in advance! Iris

Example code: ./regenie_v3.2.5.3.gz_x86_64_Linux \ --step 2 \ --ignore-pred \ # I am aware I should not skip step 1, but for this trial I did --bgen "${genotype_prefix_chr4}".bgen \ --ref-first \ --sample "${genotype_prefix_chr4}".sample \ --phenoFile "${phenotype_file}" \ --covarFile "${phenotype_file}" \ --phenoCol pheno1,pheno2,pheno3,pheno4,pheno5,pheno6 \ --covarColList sex,age,PC{1:10} \ --set-list "${set_file}" \ --anno-file "${path_to_300kwes_helper_files}/ukb23158_500k_OQFE.annotations.txt.gz" \ --mask-def "${mask_file}" \ --aaf-bins 0.01,0.001,0.0001 \ --extract-setlist "GeneX" \ --write-mask \ --write-mask-snplist \ --test recessive \ --out GroupY_GeneX_rec

joellembatchou commented 1 year ago

Hi,

Values above 1.5 for the burden mask are coded as 1 in the recessive test and the remaining values are coded as 0 (missing values are mean imputed). Samples with missing phenotypes will not be used in the association test. Did you check the carriers from the PLINK BED containing the built masks (with dosages, the burden mask values are first rounded)?

Cheers, Joelle

giorkala commented 8 months ago

Hi @joellembatchou,

I hope its fine to restart this conversation, as I'm having a similar situation to what @irisjansen did, and I'd like some clarification for how exactly Regenie works, please. In brief, I'm treying a set-based recessive test, testing for one annotation with 2 MAF thresholds - I can give more details, if needed.

There are genes for which the "ADD-SKAT" and high-MAF "REC" tests are not significant (e.g. p>0.05) but the low-MAF REC is (e.g. p<1e-30). I then compared with a recessive burden I generated manually (with {0,2}s based on homozygotes for the same mutations in the annotation), then test with step-2 but as a simple burden, and the pvalue was also ~0.05. It could be that considering variants with MAF in [0.001,0.05] bring in noise, and thus the association is lost, but I need to make sure it's not a false positive.

Also, Regenie reports a MAF only for the recessive tests (and NA for SKAT), which makes me think that this could be a single-variant type of test?

Any insights will be appreciated! Kind regards, Yiorgos

joellembatchou commented 7 months ago

Hi Yiorgos,

Recessive tests are not supported with non-burden gene-based test, e.g. SKAT. This will be made explicit in the next release.

Cheers, Joelle