privefl / bigsnpr

R package for the analysis of massive SNP arrays.
https://privefl.github.io/bigsnpr/
183 stars 43 forks source link

correlation matrix #482

Closed alhannae closed 3 months ago

alhannae commented 4 months ago

Hi Florian,

I am going through the LDpred2 tutorial (https://privefl.github.io/bigsnpr/articles/LDpred2.html).

I have a question about the part where the correlation matrix is created. You noted there that you provide pre-computed LD matrices that are usually much better. Do you refer to the 'ld' column from the hapmap3+ file? And if I understood correctly, this means that the following line of code does not need to be run:

for (chr in 1:22) {
  ind.chr <- [which](https://rdrr.io/r/base/which.html)(df_beta$chr == chr)
  ind.chr2 <- df_beta$`_NUM_ID_`[ind.chr]
  corr0 <- [snp_cor](https://privefl.github.io/bigsnpr/reference/snp_cor.html)(G, ind.col = ind.chr2, size = 3 / 1000,
                   infos.pos = POS2[ind.chr2], ncores = NCORES)
  if (chr == 1) {
    ld <- Matrix::[colSums](https://rdrr.io/pkg/Matrix/man/colSums.html)(corr0^2)
    corr <- [as_SFBM](https://rdrr.io/pkg/bigsparser/man/SFBM-class.html)(corr0, tmp, compact = TRUE)
  } else {
    ld <- [c](https://rdrr.io/r/base/c.html)(ld, Matrix::[colSums](https://rdrr.io/pkg/Matrix/man/colSums.html)(corr0^2))
    corr$add_columns(corr0, [nrow](https://rdrr.io/r/base/nrow.html)(corr))
  }
}

Thanks in advance.

Hannae

privefl commented 4 months ago

The column $ld are the LD scores, which are basically colSums(corr0^2).

alhannae commented 4 months ago

Hi Florian,

thanks for the quick reply.

Just to be sure, when dealing with a small dataset, it is better to use the hapmap3+ dataset and the ld scores right in stead of calculating the ld scores on the small dataset? From what I understand, you need those ld estimates to calculate the adjusted beta's?

Thanks very much!

Hannae

privefl commented 4 months ago

Forget about the LD scores. What you need is the LD matrices. It is better to use the pre-computed ones if possible.

As for the set of variants to use, the tutorial describe when it's better to use HapMap3, HapMap3+, or another set of variant.

privefl commented 3 months ago

Any update on this?

alhannae commented 3 months ago

Hi Florian,

I opened a new issue: https://github.com/privefl/bigsnpr/issues/486#issuecomment-2005759571 which is basically an update. This can be closed.

Thanks for following up!

Hannae

privefl commented 3 months ago

You should be able to close issues you've opened.