Clumping without genotype data ?

privefl / bigsnpr

R package for the analysis of massive SNP arrays.

https://privefl.github.io/bigsnpr/

183 stars 43 forks source link

Clumping without genotype data ? #436

Closed eliorzou closed 11 months ago

eliorzou commented 11 months ago

Hi Florian,

I am working with GWAS from Finngen biobank and I would like to do clumping to prioritize SNP from my analysis. I am wondering if it is possible to do clumping with a .bed file from the summary statistics and LD panel only. I do not have genotype data and PLINK files for my analysis (I just have LD panels available on Gnomad). From what I can see on the tutorial of bigsnpR, it seems that I need the genotypes and summary statistics, but maybe you have know another way to do it ?

Thanks a lot for your answer, Best regards,

Eulalie

privefl commented 11 months ago

I've discussed this a bit in https://github.com/privefl/bigsnpr/issues/316 I think.

What is the format of the LD matrix from Gnomad?

eliorzou commented 11 months ago

Hi Florian ! Thank you for your answer and sorry for the delay, I read the issue #316, it indeed answer partially my question. If I understand correctly you do not recommend using only LD matrix (and it's not implemented in this method). I have LD panel generated by LDSC but I can also get LD scores Hail Table. I'm not very familiar with all these formats.

My issue is that I understand that the genetic architecture of Finnish individuals is quite different from other europeans. Therefore, I guess it would be problematic to use european reference panels and genotypes available for clumping on this data. What do you think ?

privefl commented 11 months ago

For fancy methods like LDpred2 that use LD to recover joint effects from marginal effects, it is best to get an LD reference as close as possible from the GWAS pop. But it also works fairly well with one from e.g. UK ancestry (cf. https://doi.org/10.1016/j.xhgg.2022.100136).

For clumping, I think that matters less. What is the purpose of clumping you have here?

Alternatively, you could identify a few Finnish-like individuals within the UKBB as I did, and perform clumping from their genetic data.

eliorzou commented 11 months ago

Ok i see. The aim is just to prioritize a variant in a region to look at the closest genes and do some functional characterization of the region with a "flag" variant. Thank you for your answer !

privefl commented 11 months ago

If you're happy with this and have no further question on this, you can close the issue.