stephenslab / mashr

An R package for multivariate adaptive shrinkage.
https://stephenslab.github.io/mashr
Other
87 stars 19 forks source link

adding extra confounders? #80

Open Neleor opened 4 years ago

Neleor commented 4 years ago

We are trying to use your tool to detect eQTLs in single-cell data, where we have done initial eQTL mapping across different conditions. In our dataset these conditions are different stimulation conditions at different timepoints. The eQTLs found may be condition-specific, which is why a normal meta-analysis would not suffice. Our dataset contains both 10x version 2 and 10x version 3 samples, which were normalised separately for each conditions (so for three stimulation conditions, we actually have six datasets due to having v2 and v3 samples). The number of samples is also larger for the v2 sets. Right now we are adding these three conditions with both version chemicalities, as six different conditions (stim1_v2, stim1_v3, stim2_v2, stim2_v3, stim3_v2, stim3_v3). The differences between the version 2 and version 3 samples for the same stimulation condition however, are technical and should not be biological.

Is it possible to add this version chemistry as a confounder or take into account some other way that effects 'should not' be version-specific?

(we are running mash separately for each cell type, we are not trying to combine cell-type effects here)

stephens999 commented 4 years ago

Let me start by saying mashr needs z scores that are well calibrated, and this can be challenging for single cell data, so you need to be careful. We are currently experimenting with various pipelines here to see which yield good results, but don't have enough experience to give a recommendation yet.

Regarding your specific issue, the way I would try to do it is to analyze the data in each stimulation (both chemistries) to get a single z score (or beta-hat and se) for testing the null in that condition that combines information from both chemistries. Then use mashr to meta-analyse the z scores (3 different conditions).

On Wed, May 20, 2020 at 2:25 AM Neleor notifications@github.com wrote:

We are trying to use your tool to detect eQTLs in single-cell data, where we have done initial eQTL mapping across different conditions. In our dataset these conditions are different stimulation conditions at different timepoints. The eQTLs found may be condition-specific, which is why a normal meta-analysis would not suffice. Our dataset contains both 10x version 2 and 10x version 3 samples, which were normalised separately for each conditions (so for three stimulation conditions, we actually have six datasets due to having v2 and v3 samples). The number of samples is also larger for the v2 sets. Right now we are adding these three conditions with both version chemicalities, as six different conditions (stim1_v2, stim1_v3, stim2_v2, stim2_v3, stim3_v2, stim3_v3). The difference between the version 2 and version 3 samples for the same stimulation condition however, are technical and should not be biological.

Is it possible to add this version chemistry as a confounder or take into account some other way that effects 'should not' be version-specific?

(we are running mash separately for each cell type, we are not trying to combine cell-type effects here)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/mashr/issues/80, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRO7WN7OZ5HWGZ42RJ3RSOAWJANCNFSM4NFVMVDQ .

Neleor commented 4 years ago

I was using the betas and SEs for the chemistries and conditions separately (I understood from the vignettes that this was preferred to the Z-scores), but I'll try the combined Z-scores per condition. Thank you.

Neleor commented 4 years ago

I have tried using the Z-scores, but we are seeing very high concordance between the significance of the effects using mashr and the significance when running a 'regular' meta-analysis across all stimulations for one timepoint (3) or all stimulations across all timepoints (6). We believe that there should at least be some stimulation-specific eQTLs. Is there a lower limit for the mashr algorithm to work properly? (we have only 3 or 6 conditions as opposed to the 44 tissues described in the paper), or is there some other selection/filtering we could do?

gaow commented 4 years ago

@Neleor assuming everything works correctly, then the mashr mixture prior you provide should already allow for the possibility to detect stimulation specific eQTLs. If the posterior inference of effects (weights on each component as well as effect size inference for each SNP) mostly suggest shared rather than stimulate specific patterns, it means your data doesn't strongly favor stimulate specific effects.

Is there a lower limit for the mashr algorithm to work properly?

What's the scale of your data? In GTEx we used about 20K "strong" gene-snp pairs to learn data-driven prior (patterns of sharing), and about 80K "random" gene-snp pairs to fit mash mixture model. As you can see in the example below, the patterns of sharing we learned from data, before fitting mash model, does exhibit patterns other than all shared effects:

https://stephenslab.github.io/gtexresults/Uk3.html

Maybe you can similarly plot the data-driven prior learned and see if any stimulate specific effect can be captured?