I have 2000 sample size in target dataset (African ancestry) and 230,000 samples in summary statistics. I'd like to use LDpred2-grid method with multiple values for h2 and p parameters. Is it possible for you to tell me what sample size of target dataset is good for validation set in order to choose best parameters and remaining samples in the test set?
If you're looking at a continuous phenotype, 2000 may be enough (or even 1000 validation & 1000 test).
It you have a binary outcome with not a large prevalence (i.e. close to 50%), that won't be enough.
Hi Florian,
I have 2000 sample size in target dataset (African ancestry) and 230,000 samples in summary statistics. I'd like to use LDpred2-grid method with multiple values for h2 and p parameters. Is it possible for you to tell me what sample size of target dataset is good for validation set in order to choose best parameters and remaining samples in the test set?
Thanks, Rameez