rgcgithub / regenie

regenie is a C++ program for whole genome regression modelling of large genome-wide association studies.
https://rgcgithub.github.io/regenie
Other
182 stars 52 forks source link

Apply regenie to the combination of two datasets from different arrays. #493

Open chihyunchung opened 7 months ago

chihyunchung commented 7 months ago

Hi,

Thank you for creating this software.

I encountered an issue when trying to use this software. I am attempting to use regenie to correct case-control imbalance in a dataset that combines two sets of data from two different genotyped arrays (with about 100k overlapping SNPs) into one set. Since the data used in step 1 needs to be unimputed, What's your suggestion for processing it? For example, should I extract only the overlapped SNPs in step 1? Or should I process these two datasets separately? Perhaps phase the untyped variants?

joellembatchou commented 7 months ago

Hi,

Is there a reason why you cannot analyze these separately and then meta-analyze the summary statistics?

chihyunchung commented 7 months ago

Hi,

Thank you for your reply. My advisor prefers utilizing the combined dataset due to the limited sample size. Additionally, since most of our collaborators also use the same dataset, it facilitates the comparison of results.

joellembatchou commented 5 months ago

100k variants would be quite low for a step 1 run (we recommend about ~500k). Perhaps adding imputed variants with very high info scores?