Question about input files and rare variant analysis

rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.

https://rgcgithub.github.io/regenie

Other

181 stars 53 forks source link

Question about input files and rare variant analysis #463

Closed Lloyd-LiuSiyi closed 10 months ago

Lloyd-LiuSiyi commented 10 months ago

Dear @joellembatchou , I have read the FAQs and #271 and understand that step1 aims to capture genome-wide polygenic effects. I would like a bit more advice in performing rare variant association with UKBB WES data.

Since sequencing data of UKBB is split into smaller blocks of different chromosomes, e.g. c1_b0, c1_b1. Would you recommend merging different blocks and performing analyses for each chromosome, or merging all genotype data together?
If I only wish to perform analysis on certain chromosomes, e.g. chr1 and chr2, do I just need to input genotype data of chr1 and chr2 in step1 instead of 500k variants selected across 22 chromosomes?
Am I understanding correctly that in step1 I can provide imputed genotyping data with common variants and in step2 provide WES data with rare variants? I sincerely appreciate any suggestions! Best regards

joellembatchou commented 10 months ago

Hi,

For step 1, we recommend to use directly genotyped SNPs and then in step 2 you can re-use the step 1 predictions output across WES data. Since step 2 is purely parallel in nature, it is fine to analyze blocks of chromosomes separately (ie in different REGENIE runs) -- results will be exactly the same as a full chromosome run.
For step 1, we are aiming to capture genome-wide polygenic effects so you need to include variants across all chromosomes. In step 2 you can use options --chr or --chrList to specify select chromosomes to test.
See #1 above as well as this page.

Cheers, Joelle

Lloyd-LiuSiyi commented 10 months ago

Hi,

For step 1, we recommend to use directly genotyped SNPs and then in step 2 you can re-use the step 1 predictions output across WES data. Since step 2 is purely parallel in nature, it is fine to analyze blocks of chromosomes separately (ie in different REGENIE runs) -- results will be exactly the same as a full chromosome run.

For step 1, we are aiming to capture genome-wide polygenic effects so you need to include variants across all chromosomes. In step 2 you can use options --chr or --chrList to specify select chromosomes to test.

See add option to remove select SNPs from step 1 GRM SNPs #1 above as well as this page.

Cheers, Joelle

Thanks @joellembatchou! In the past two weeks I've discovered how extraordinarily useful Regenie has been. Thanks again for developing and maintaining it!