stephenslab / mvsusieR

A new model and algorithm for multivariate Bayesian variable selection regression.
https://stephenslab.github.io/mvsusieR
Other
21 stars 7 forks source link

Find CS of gt:condition interaction-mediating variants #46

Open JosephLalli opened 1 year ago

JosephLalli commented 1 year ago

Hi everyone!

I don't know whether this feature request is a whole new paper's worth of work, or a simple tweak to the existing algorithm.

If I understand correctly, mash and mvSusie ask whether there is a CS that has an observed effect which is more likely to be drawn from the modeled distribution of effect variants than the modeled distribution of null variants.

I would like determine whether a CS is more likely to be drawn from a modeled effect distribution than the same CS in a specified 'null' condition.

Another way to put it: I work on questions related to sex differences. I am less interested in whether or not a variant causes a change in phenotype, and I am more interested in whether a variant causes change in phenotype in one sex relative to another.

If this were a simple linear regression, y ~ G + S + G:S, then susie finds the Bayesian equivalent of G, mvsusie finds the Bayesian equivalent of G + S, and I would like to find the Bayesian equivalent of G:S.

If this is a substantial effort, then I will likely stick with my current approach (susie_rss using the interaction term's beta and se as bhat and shat). However, if this is a relatively small tweak to the current algorithm, I would greatly appreciate the feature.

Thanks, Joe Lalli

JosephLalli commented 1 year ago

Would it be as simple as (in pseudocode):

fit$b1 = mean effect size fit$b2 = variance of effect size dimensions of each (since I forget what they actually are) = [#conditions, #variants, L]

for (set in fit$sets){ L = set$L snps = CS$variants diff_beta = fit$b1[1, snps, L] - fit$b2[2, snps, L] diff_se = sqrt(fit$b2[1,snps, L]) + fit$b2[2, snps, L] diff_Z = diff_beta/diff_se diff_pval = pnorm(diff_Z, mean = 0, sd = 1, lower.tail = TRUE) }

Or am I misunderstanding the situation, and a simple Z test won't apply in this setting? (Can you tell my stats are mostly self-taught?)

gaow commented 1 year ago

Hi Joe! It sounds like you might want to do stratified analysis by sex, get the CS globally via mvSuSiE, then use some metric to quantify if there is a condition specific effect? I tihnk mvSuSiE provides lfsr per condition that might be helpful. Otherwise, what we did in the MASH paper might be used here as well to test for sharing by sign or sharing by magnitude. I think this would be a good starting point rather than to worry about an interaction model? Of course there are more formal ways to test if the difference in the effect sizes are indeed significant across the two conditions but I think a good starting point is to quantify condition specific effect in mvSuSiE framework.

JosephLalli commented 1 year ago

Hey Gao,

Yup I'm back at it. If I understand the description, conditional lfsr is the P (variant effect is above/below 0) | variant is an effect variant). Sharing by sign or by magnitude is closer, and the z test on the condition-specific-effect size (b1) and condition-specific effect variance (b2) is working ok, but I'm not sure if b1 and b2 are completely identical in concept to beta and variance.

gaow commented 1 year ago

Sharing by sign or by magnitude is closer, and the z test on the condition-specific-effect size (b1) and condition-specific effect variance (b2) is working ok, but I'm not sure if b1 and b2 are completely identical in concept to beta and variance.

We (a collaborative project that I lead) have been using something else along the same line of idea to quantify the difference between conditions. I'm very much preoccupied now but someone from my team is working on that. You can shoot me a message next week to touch base on it. We will eventually run it by the team here before suggesting the community to do that. Of course its just my two cents and perhaps @zouyuxin @pcarbo @stephens999 have other ideas

stephens999 commented 1 year ago

I would think about this as a question about the difference in the effect size in males vs females. That is if b_j1 is effect in males of SNP j and b_j2 is effect in females, we can define the "clfsr" for the interaction term (analogous to equation (19) in https://www.biorxiv.org/content/10.1101/2023.04.14.536893v1) as

clfsr_j = 1- max(pr(b_j1-b_j2>0|...), pr(b_j1-b_j2<0|...))

If we computed these quantities then we could assign a measure of significance to each CS for the interaction term, in a similar way that we assign significance to each CS for each phenotype.

JosephLalli commented 1 year ago

clfsr_j = 1- max(pr(b_j1-b_j2>0|...), pr(b_j1-b_j2<0|...))

This is precisely the quantity I would like to measure.

Can anyone send me a link to the line in mvSusie that calculates pr(b_jr > 0)? I don't know the code's organization particularly well, but I might be able to write a fork that would optionally calculate pr(b_jr-b_j1 > 0), where r is the trait under consideration, and we assume the first trait is the baseline trait.

pcarbo commented 1 year ago

I think Matthew meant equation 18 (which defines the clfsr), not equation 19 (which defines PIPs).

pcarbo commented 1 year ago

This sounds very similar to the setup for MASH common baseline. If I understand correctly, @JosephLalli's setup is a special case of common baseline mash with R = 2.

mvsusieR uses mashr for the core computations, so presumably it should be possible to leverage mash common baseline as well in mvsusieR.

Regardless, a good first step is to figure out how your quantity would be computed inside mashr.

JosephLalli commented 1 year ago

@pcarbo, I agree - that seemed like a perfect way to address my question! However, tl;dr - mashR's common baseline requires an R >= 3 to function.

Details: I've been working on this problem today, and I've noticed an issue: mashr, when adjusting for a common baseline condition using mash_update_data, reduces the number of conditions by one in the contrast_matrix function:

if(ref %in% 1:R){
    L = diag(R)
    L[,ref] = -1
    L = L[-ref,]
    row.names(L) = paste0(name[-ref],'-', name[ref])
}

and therefore in the mash_set_data_contrast function.

If L=2 in the above code, a 1D vector is returned.

if(ref %in% 1:R){
        L = diag(R)
        L[,ref] = -1
        L = matrix(L[-ref,], nrow=R-1, ncol=R)
        row.names(L) = paste0(name[-ref],'-', name[ref])
}

the result is still a Bhat/Shat with only one condition, which I don't believe mashR can work with.

What do you think is the best course of action? For now, I'm going to run mashR without a baseline and identify genes with male/female specific effects, then run mvSusie on those genes with a mashR-generated prior, and finally calculate a z-score on the resulting effect sizes.

This seems like a solvable problem, I just haven't spent as much time thinking about how to implement these ideas. @gaow @stephens999 , do you have any thoughts?

stephens999 commented 1 year ago

definitely a solvable problem. Can you just run mash as usual and then run mash_compute_posterior_matrices with A = rbind(c(1,-1)) ?

JosephLalli commented 1 year ago

Yes, that yields an n x 1 matrix of shrunken beta values, sigma values, and lfsr / lfdr values. Thank you!

I see a nice histogram of lfsr values, with a plausible number of genes showing results above the baseline lfsr.

image

It looks like the area of enriched p values begins around 0.02 lfsr, so I will run all genes below that value through mvSusie and see what comes out of it.

Question re: priors: Using these results, I can calculate an expected FDR for the set of genes I choose to run through mvSusie. Would it be prudent to set a null prior at that FDR? (In other words, if I run mvSusie on a set of genes chosen with a threshold of 0.2 FDR, should I set the prior weights to be a uniform matrix of 0.2?)

pcarbo commented 1 year ago

Regarding Matthew's suggestion above, I believe limma allows you to do something similar, which you could use as a "sanity check" (I'm not sure what options limma has for shrinkage however). Just in case you are unsure with your results and want to run your analysis against a well-tested software such as limma.