Closed pjordab closed 2 years ago
Either
or it can be a continuous trait with sd(y) != 1
Which sumstats are these?
Thank you very much for your prompt response. I have calculated the effective n using the formula Neff = 4 / (1 / cases + 1 / controls) as these are sumstats of a binary trait.
That's weird; are you sure these come from logistic regression? If you have a link to the summary data, I can have a quick look at these summary statistics.
Thank you very much Florian for your help, it is greatly appreciated! The data I am using is internal data from my group that is not yet ready to be publicly released.
Which methods did you use to perform the GWAS and meta-analysis?
Hi Florian, Sorry for the late reply. The sumstats stem from MTAG. My SNPs are removed due to this filtering criteria: sd_ss > (sd_val + 0.1). Is it appropriate to use these sumstats in LDpred2 or should I adjust my beta/beta_se/neff values somehow before using them? Many thanks!
Which n_eff
are you using from MTAG then?
Do you really get beta
and beta_se
from MTAG, or just z-scores?
I get beta and beta_se from the MTAG (https://github.com/JonJala/mtag/blob/master/mtag.py)
I use the GWAS-equivalent sample size which is calculated:
Neff GWAS * (mean chi^2 MTAG -1)/(mean chi^2 GWAS -1)
Try maybe instead to use the median ratio of X2-stats for X2 > 30 (as done in BOLT-LMM).
From the plot you have, it seems that the effective sample size you're using is too small.
Otherwise, try to estimate Neff directly from the median of the values from equation (4) of https://doi.org/10.1101/2021.03.29.437510.
When you say mean values, you mean calculate an effective n for the whole sample and take the median of beta and the median of beta_se?
(4/var-median_beta^2)/median_beta_se^2
What value do I take as the sample variance?
Or do I calculate the N effective per SNP, and use the variance per SNP (2MAF(1-MAF)
Thank you!
Calculate per SNP, and then take the median.
Hi Florian, this worked. Sincere (and many!) thanks for all your previous answers and help.
I'd like to ask you some additional questions about the preparation of sumstats.
1) In case I am using my own genotypes to calculate the correlation matrix, should I apply only the QC recommended here?
https://github.com/privefl/paper-ldpred2/blob/master/code/prepare-sumstats.R
2) And when using the LD reference provided should I apply only the QC recommended here (and not the previous one)?
https://github.com/privefl/paper-ldpred2/blob/master/code/example-with-provided-ldref.R (line 27-31)
Finally, one last doubt. When the paper mentions in the last paragraph of the discussion "However, LDpred2-auto requires some QC to be performed on the summary statistics",
I understand that if I use all 3 models the most practical is to calculate my correlation matrix with the SNPs after QC and from there follow separately in each of the methods, but if I only use Grid, for example, then I don't need to perform the QC?
The QC should be about the same.
The QC is more important for LDpred2-auto, but I would suggest doing it for LDpred2-grid as well.
Hi Florian,
Huge congratulations for your recent article in AJHG, it's a great job, very interesting!
Regarding the equation 1 update you have described there,
is it updated in this version of bigsnpr?
Version: 1.8.1 Date: 2021-05-27
And in relation to QC, as equation 1 is been updated from:
sd = sd(y) / (se * sqrt(n))
to:
sd = sd (y) / sqrt (n se beta^2 + beta^2)
Would equation 2 be as follows?
sd = 2/(se * sqrt(neff))
to
sd = 2 / sqrt (se neff beta^2 + beta^2)
And should we still use n effective in this formula?
Many thanks for your help and your work!
Thanks!
Yes, these are updates to the previous formulas (less one approximation, note the added beta^2 term).
Note that you should read beta_se^2
instead of se * beta^2
.
Oh, thanks!
I messed up with the parenthesis in the original formula.
So:
sd = sd (y) / sqrt (n * se ^2 + beta^2)
sd = 2 / sqrt (neff * se^2 + beta^2)
And last question, from which version of the code is it updated? I currently have installed this one:
Version: 1.8.1 Date: 2021-05-27
I think it was updated in v1.5.6.
Great!! Thank you!!!
Hi Florian,
I am using LDpred2 with the results of a meta-analysis and when applying QC (https://github.com/privefl/paper-ldpred2/blob/master/code/prepare-sumstats.R) all my SNPs are discarded.
I attach the graph. Any advice?
Thank you very much.
!