natsuhiko / rasqual

Robust Allele Specific Quantification and quality controL
37 stars 20 forks source link

Z-score or effect and standard error #55

Open plbngl opened 2 years ago

plbngl commented 2 years ago

Dear Natsuhiko, thanks again for the great tool! I was wondering if it is possible to obtain z-scores or effect size AND standard errors from rasqual outputs. I would like to perform downstream analyses to compare QTLs across conditions and they require these values. Thank you very much for your help,

Best, Paola

HaniceSun commented 2 years ago

Dear Natsuhiko,

I have a similar question. I would like to do colocalization analysis using coloc, which recommends using beta and SE (or var) of beta. It seems beta is Effect size (Pi) - 0.5 in the RASQUAL output, but could you please advise me on how to get the SE value? Thanks a lot.


Just saw discussions on this post and this paper.

It seems we can estimate both beta and SE of beta given z-score, allele frequency, and sample size. SE =1 / sqrt(2p(1− p)(n + z^2)) beta = z * SE

So is it correct to get SE from RASQUAL output bellowing? SE =1 / sqrt(2 * Allele frequency (1− Allele frequency )(n + Chi square statistic)), n is the sample size we know as an input.


It seems we can estimate SE here simply as bellowing:

SE = abs(beta/np.sqrt(Chi square statistic))

hbliu commented 2 years ago

Hi HaniceSun, I have a similar question. Did you try your method to get SE from RASQUAL? It works or not?

Thank you

HaniceSun commented 2 years ago

Hi Hongbo,

I believe it is correct. The results make sense when I use the SE this way in the cloloc analysis with both ABF and SuSiE method.

Hi HaniceSun, I have a similar question. Did you try your method to get SE from RASQUAL? It works or not?

Thank you

hbliu commented 2 years ago

HI Hanice,

Thank you for the reply. I will try.

Best, Hongbo

isosceleswheel commented 1 month ago

Hi @HaniceSun, can you clarify which formula for SE is the correct formula in your comment above? I am trying to do the same analyses using coloc and I have not been able to find a definitive answer for converting the outputs from RASQUAL into beta, z-score or SE.

Your first equation substitutes $\chi^2$ for $z^2$, which is what the paper you referenced uses. Is this accurate?

Next, you presented two different equations for estimating SE from the RASQUAL data:

1) SE = 1 / sqrt(2 * Allele frequency (1− Allele frequency )(n + Chi square statistic)) 2) SE = abs(beta / sqrt(Chi square statistic)) -- this assumes beta = (Effect Size Pi) - 0.5

These do not give the same result for a number of tests cases that I tried. Here is a scatter plot of the results comparing version (1) and version (2):

Screenshot 2024-08-07 at 3 48 28 PM

Version (2) consistently yields a much smaller SE estimate, and the relationship is non-systematic.

Do you or any other users have any updates for this problem? @natsuhiko Please weigh in if you have suggestions for estimating z-scores, beta values and SE from the standard RASQUAL output.

isosceleswheel commented 3 weeks ago

Update on estimating beta, SE from RASQUAL output:

Hi all, in case anyone is still interested in this problem, I have done some testing of the equations in this paper that was suggested by @HaniceSun. The estimates of beta and SE rely on the identity of $\chi^2 \approx z^2$, which should hold in our case (see discussion of mathematical details here).

Thus the relevant equations are:

$\beta = z \cdot SE$

and

$SE = \frac{1}{\sqrt{2p(1-p)(n+z^2)}}$

where $p$ is the Allele Frequency output from RASQUAL. We then add one more change to get the direction of beta:

$\beta = sgn(\pi-0.5) \cdot z \cdot SE$

where $\pi$ is RASQUAL's Effect size Pi output and the $sgn(\pi-0.5)$ function gives us the direction of the effect for beta, i.e.:

$$ sgn(x) = \begin{cases} 1 & \text{if } x > 0, \ 0 & \text{if } x = 0, \ -1 & \text{if } x < 0 \end{cases} $$

The final equations, using the RASQUAL ouputs $\chi^2$, $p$ and $\pi$ are:

$SE = \frac{1}{\sqrt{2p(1-p)(n+\chi^2)}}$ $\beta = sgn(\pi-0.5) \cdot \sqrt{\chi^2} \cdot SE$

Just as a sanity check, I compared beta values estimated with these formulas to "raw" beta values that I calculated by fitting an ordinary least squares (OLS) model with ATACseq reads (converted to z-scores) as the dependent variable and genotypes as the independent variable. These aren't expected to be identical, because RASQUAL is integrating additional information -- most significantly, allele counts in reads -- but they should have a similar magnitude and direction.

The attached plot shows the results comparing these estimates for about 300,000 QTLs and you can see that they are broadly similar (Pearson $R^2$ = 0.863). The left panel is just a scatter of the two estimates of beta and the right panel shows the density of these points. The magnitude of beta is consistently greater when estimated from RASQUAL statistics, which is probably reflecting the additional information that goes into these estimates.

output