stephenslab / mvsusieR

A new model and algorithm for multivariate Bayesian variable selection regression.
https://stephenslab.github.io/mvsusieR
Other
21 stars 7 forks source link

could you confirm if I ran it correctly #54

Open kkonoo opened 1 week ago

kkonoo commented 1 week ago

Hi there,

I'm interested in using mvsusieR (fit_rss) to fine-map my single-cell eQTLs and estimate the cell-type-specificity of their effects. I'd like to make sure I did it correctly.

  1. estimating prior 1) I kept the most significant eSNP for eGene across all cell types. (if there are two different eSNPs from different cell types for the same gene, I remain a more significant one.) 2) run mashr to make U.ed data = mash_set_data(betas, ses) m.1by1 = mash_1by1(data) strong = get_significant_results(m.1by1, 0.05)

    U.pca = cov_pca(data, 5, subset=strong) U.ed = cov_ed(data, U.pca, subset=strong)

    3) create prior based on the mashr output U <- U.ed w <- rep(1 / length(U), length(U)) # Equal weights for simplicity (can I weight this based on cell size for each cell type?) prior <- create_mixture_prior(list(matrices = U, weights = w), null_weight = 0)

  2. run mvsusieR (for each gene) 1) estimate residual_variance null_markers <- which(apply(abs(Z),1,max) < 2) Z_null <- Z[null_markers,] Vest <- cov(Z_null)

    2) fitting fit_rss <- mvsusie_rss(Z, R, n, prior_variance = prior, residual_variance = Vest, estimate_prior_variance = TRUE, tol = 0.01)

When I ran this, I could get this plot for a certain credible set (from a specific gene). image

Does this look fine?

Best, EH

gaow commented 6 days ago

@kkonoo it looks like it is mostly correct. In practice you may want to first run some diagnosis on the mixture prior you obtain, for example by simply plotting them and see if they make sense, for example like the heatmap here.

In our applications we would also use the weights learned from mixture model, especially when they seem reasonable from diagnosis plot. But there are cases especially when there are many missing data points in some contexts eg cell types compared to others which may result in low estimated weights in the context specific effects (close to zero). In those cases I would deem the estimated weights not 100% reliable and I would at least try a round of analysis with uniform weights just like what you did above to see if the results change much.

kkonoo commented 5 days ago

Thanks for your reply. I'll check it out!