privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

Power and accuracy of PRSs #326

Closed rameez500 closed 2 years ago

rameez500 commented 2 years ago

Hi Florian,

I have 2000 sample size in target dataset (African ancestry) and 230,000 samples in summary statistics. I'd like to use LDpred2-grid method with multiple values for h2 and p parameters. Is it possible for you to tell me what sample size of target dataset is good for validation set in order to choose best parameters and remaining samples in the test set?

Thanks, Rameez

privefl commented 2 years ago

If you're looking at a continuous phenotype, 2000 may be enough (or even 1000 validation & 1000 test). It you have a binary outcome with not a large prevalence (i.e. close to 50%), that won't be enough.