privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
186 stars 44 forks source link

African LD reference #305

Closed rameez500 closed 2 years ago

rameez500 commented 2 years ago

I have a quick question; I am working in African ancestry for both target dataset and summary statistics. I noticed that in the ldpred2 paper, the European LD reference is available at https://doi.org/10.6084/m9.figshare.13034123. Is there any African LD reference for LDPRED2 that I can find?

privefl commented 2 years ago

No, we have not provided any LD ref for African ancestry yet. If you have more than 2000 individuals in your target data, you can always use this as reference panel.

Otherwise, if you have access to the UK Biobank and your data is from West/South Africa, you can use the "Nigeria" group of individuals that you can reproduce from https://github.com/privefl/UKBB-PGS#code-to-reproduce-ancestry-groups (from this paper).

rameez500 commented 2 years ago

Hi Florian,

Thanks for the prompt reply.

I am working on African Americans ancestry samples (target dataset). This target dataset is imputed by CAAPA from the Michigan imputation server. The total number of samples is about 4000 with 10 million variants.

I know Hapmap3 variants are strongly recommended to be filtered in LDpred2 because they offer good coverage of the genome.

I was reading one of the issue of using Using Alternate LD Reference Panels: https://github.com/privefl/bigsnpr/issues/225

Which one of the following would you recommend me to follow:

  1. The code used to provide the LD ref in the paper: https://github.com/privefl/paper-ldpred2/blob/master/code/provide-ld-ref.R. OR
  2. European LD reference which contains about 1.1M variants
privefl commented 2 years ago

If you have African sumstats, you need to use an African LD ref. So option 1.

rameez500 commented 2 years ago

Thanks for your help. I wanted to ask something regarding option 1.

I have two datasets African summary statistics and African target dataset; but I'm seeking to find African LD ref.

In provide-ld-ref.R line 74 , you can compute LD matrices using snp_cor() and save the result in "_ld-ref/LDchr"

If I want to calculate PRS using the LDpred2-auto, I can use example-with-provided-ldref.R with provided LD reference.

My question is: Are the variants provided in the LD reference in example-with-provided-ldref.R line 7, generated from provide-ld-ref.R line 74?

privefl commented 2 years ago

Yes, this is the code I used to prepare the LD ref. And then made this other script as an example/tutorial how to use it.