rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
181 stars 53 forks source link

Question about input files and rare variant analysis #463

Closed Lloyd-LiuSiyi closed 10 months ago

Lloyd-LiuSiyi commented 10 months ago

Dear @joellembatchou , I have read the FAQs and #271 and understand that step1 aims to capture genome-wide polygenic effects. I would like a bit more advice in performing rare variant association with UKBB WES data.

  1. Since sequencing data of UKBB is split into smaller blocks of different chromosomes, e.g. c1_b0, c1_b1. Would you recommend merging different blocks and performing analyses for each chromosome, or merging all genotype data together?
  2. If I only wish to perform analysis on certain chromosomes, e.g. chr1 and chr2, do I just need to input genotype data of chr1 and chr2 in step1 instead of 500k variants selected across 22 chromosomes?
  3. Am I understanding correctly that in step1 I can provide imputed genotyping data with common variants and in step2 provide WES data with rare variants? I sincerely appreciate any suggestions! Best regards
joellembatchou commented 10 months ago

Hi,

  1. For step 1, we recommend to use directly genotyped SNPs and then in step 2 you can re-use the step 1 predictions output across WES data. Since step 2 is purely parallel in nature, it is fine to analyze blocks of chromosomes separately (ie in different REGENIE runs) -- results will be exactly the same as a full chromosome run.
  2. For step 1, we are aiming to capture genome-wide polygenic effects so you need to include variants across all chromosomes. In step 2 you can use options --chr or --chrList to specify select chromosomes to test.
  3. See #1 above as well as this page.

Cheers, Joelle

Lloyd-LiuSiyi commented 10 months ago

Hi,

  1. For step 1, we recommend to use directly genotyped SNPs and then in step 2 you can re-use the step 1 predictions output across WES data. Since step 2 is purely parallel in nature, it is fine to analyze blocks of chromosomes separately (ie in different REGENIE runs) -- results will be exactly the same as a full chromosome run.
  2. For step 1, we are aiming to capture genome-wide polygenic effects so you need to include variants across all chromosomes. In step 2 you can use options --chr or --chrList to specify select chromosomes to test.
  3. See add option to remove select SNPs from step 1 GRM SNPs #1 above as well as this page.

Cheers, Joelle

Thanks @joellembatchou! In the past two weeks I've discovered how extraordinarily useful Regenie has been. Thanks again for developing and maintaining it!