rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
182 stars 53 forks source link

Conditional analysis #464

Closed alyssacl closed 1 month ago

alyssacl commented 10 months ago

Hello, I am hoping to get some feedback on how to perform conditional analysis in regenie.

  1. Should I be performing conditional analysis in step 1 or step 2?
  2. What format should my conditional list file be in? I have tried 2 formats:

6:484453_C_G 6 484453 484453

My code is as follows: run_regenie_cmd="regenie --step 2 --out snp485991.conditioned484453.dose.assoc.c6 \ --bgen ${data_field}_c6_b0_v1.bgen \ --ref-first \ --sample ${data_field}_c6_b0_v1.sample \ --chr 6 --range 6:400000-500000 \ --condition-list condition.list.WM.txt \ --phenoFile WM_LPL_phenotype_AUG2023.txt --covarFile WM_LPL_covariate_AUG2023.txt \ --remove exclude_participants.txt \ --bt --approx --firth-se --firth \ --pred WM38_results_pred.list --bsize 1000 \ --pThresh 0.01 --minMAC 1 --threads 128 --gz"

I get this error: ERROR: none of the variants were found in the genotype file.

*This makes me think that the conditional file format is the problem. As without the --conditional-list command it runs.

Thank you, Alyssa

joellembatchou commented 10 months ago

Hi Alyssa,

  1. You can perform conditional analysis in both steps 1 & 2. When only performing it in step 2 (ie using the step 1 results from unconditional run), the LOCO predictions should capture some of the effects of the conditional variants (assuming they explain some moderate amount of phenotypic variance). In practice applications, we have found similar results between using step 1 predictions unconditional vs conditional on a variant set. In extreme cases where the variants explain a high portion of the phenotypic variability, conditioning in step 1 would be better as we are using an infinitesimal model (ie assuming similar effect size magnitudes genome-wide).
  2. Check the documentation. Variants specified in --condition-list must be present in the genotype file (i.e. ${data_field}_c6_b0_v1.bgen) unless you use --condition-file to specify an external genotype file.

Cheers, Joelle

alyssacl commented 1 month ago

Thank you!