lhe17 / nebula

GNU General Public License v2.0
26 stars 6 forks source link

Interaction term in nebula #16

Closed AdelynTsai closed 10 months ago

AdelynTsai commented 1 year ago

Hi, Thank you for the tool. It's very helpful! I'm wondering if it's possible to include an interaction term in the nebula, and if so, how should I code it?

Here's how I code now without the interaction: cov.mm <- model.matrix(~sqrtCAA + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f) nebulafit <- nebula(count=nebula.mm.f,id=meta.data.f$subject,pred=cov.mm,offset=total)

sqrtCAA is the phenotype of interest (it's a continuous phenotype), Batch_Flowcell + Gender + Age_At_Death are the fixed covariates, and subject is the random covariate.

However, I also have some biochemical measures and I want to know how the effect of biochemical measures together with sqrtCAA can affect expression. I'd like to do this analysis with Nebula. Please let me know if this is possible.

Thank you!

lhe17 commented 1 year ago

Hi AdelynTsai,

Thank you for your question. Yes, you can do that. To include the interaction term between sqrtCAA and biochemical in the model, you can try the following cov.mm <- model.matrix(~sqrtCAA*biochemical + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f) Then, you should see an additional column sqrtCAA:biochemical in cov.mm corresponding to the interaction term.

Best regards, Liang

On Mon, Jan 30, 2023 at 4:18 PM AdelynTsai @.***> wrote:

Hi, Thank you for the tool. It's very helpful! I'm wondering if it's possible to include an interaction term in the nebula, and if so, how should I code it?

Here's how I code now without the interaction: cov.mm <- model.matrix(~sqrtCAA + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f) nebulafit <- nebula(count=nebula.mm.f,id=meta.data.f$subject,pred=cov.mm ,offset=total)

sqrtCAA is the phenotype of interest (it's a continuous phenotype), Batch_Flowcell + Gender + Age_At_Death are the fixed covariates, and subject is the random covariate.

However, I also have some biochemical measures and I want to know how the effect of biochemical measures together with sqrtCAA can affect expression. I'd like to do this analysis with Nebula. Please let me know if this is possible.

Thank you!

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/16, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISURYHTMMHFHMZU2ECSDWU7LTRANCNFSM6AAAAAAULI4J2U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

AdelynTsai commented 1 year ago

Hi Liang, Thank you for your response! One other question I have is that given that my phenotype (sqrtCAA) is a continuous variable, can I interpret the logFC_sqrtCAA in the summary output as the correlation coefficient (i.e. beta/estimate)?

Thank you again!

lhe17 commented 1 year ago

Hi AdelynTsai,

Please see the interpretation of logFC in my answer to the previous question https://github.com/lhe17/nebula/issues/14 .

Best regards, Liang

On Thu, Feb 2, 2023 at 2:56 PM AdelynTsai @.***> wrote:

Hi Liang, Thank you for your response! One other question I have is that given that my phenotype (sqrtCAA) is a continuous variable, can I interpret the logFC_sqrtCAA in the summary output as the correlation coefficient (i.e. beta/estimate)?

Thank you again!

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/16#issuecomment-1413786425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUQFK6GOQGNVO6Z2BZLWVO4INANCNFSM6AAAAAAULI4J2U . You are receiving this because you commented.Message ID: @.***>

AdelynTsai commented 1 year ago

Hi Liang, Thanks for your previous answers. I've started doing interaction analysis using Nebula. As previously mentioned, I used cov.mm <- model.matrix(~sqrtCAA*biochem + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f). I have some questions about interpreting the results from the interaction analysis. I attached the results from 2 genes here from DEG analysis with sqrtCAA alone, with biochemical measures alone (cd31_tx_std & ab40_tbs_ln_std) and with interactions between sqrtCAA x biochemical measures. I know I should be specifically looking at the interaction results from the column with sqrtCAA:biochem, but I'm wondering why the logFC_sqrtCAA and logFC_biochem from the interaction analysis, as well as the results of se and p, so different from the logFC, se and p when I did the analysis with just the sqrtCAA and biochemical measures alone?

Thank you so much again!

lhe17 commented 1 year ago

Hi AdelynTsai,

I need the following information to better understand what's going on.

Are sqrtCAA and biochem cell-level or sample-level variables (sample-level variables share the same value across all cells from a sample)? Is biochem a binary variable or continuous?

How many samples and cells are there in your data? And how many columns in your design matrix cov.mm? What do you get if you put both variables in the model without the interaction term?

Best regards,

Liang

On Thu, Mar 2, 2023 at 8:34 PM AdelynTsai @.***> wrote:

Hi Liang, Thanks for your previous answers. I've started doing interaction analysis using Nebula. As previously mentioned, I used cov.mm <- model.matrix(~sqrtCAA*biochem + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f). I have some questions about interpreting the results from the interaction analysis. I attached the results from 2 genes here from DEG analysis with sqrtCAA alone, with biochemical measures alone (cd31_tx_std & ab40_tbs_ln_std) and with interactions between sqrtCAA x biochemical measures. I know I should be specifically looking at the interaction results from the column with sqrtCAA:biochem, but I'm wondering why the logFC_sqrtCAA and logFC_biochem from the interaction analysis, as well as the results of se and p, so different from the logFC, se and p when I did the analysis with just the sqrtCAA and biochemical measures alone?

Thank you so much again!

Nebula_interaction_Q.xlsx https://github.com/lhe17/nebula/files/10875195/Nebula_interaction_Q.xlsx

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/16#issuecomment-1452443552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISURUS42RMQUOIE2U2OLW2DY4TANCNFSM6AAAAAAULI4J2U . You are receiving this because you commented.Message ID: @.***>

AdelynTsai commented 1 year ago

Hi Liang, sqrtCAA and biochem are sample-level variables. Both of them are continuous variables.

Samples and cells differ by the biochem measures and cell type. For the example I gave, Astrocyte cd31_tx has 78 samples and 17722 cells. For microglia ab40_tbs, there are 78 samples and 18409 cells. In general, I've a range from 74~78 samples and 971 cells to 44151 cells among all the cell types I have.

For the design matrix, when I used model.matrix(~sqrtCAA*biochem + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f), there are 9 columns corresponding to the variables I gave in the model.matrix (I've 5 different batch_flowcell that makes it 4 different batch_flowcell columns in the design matrix). On the other hand, when I put both variables in the model without the interaction term, which is model.matrix(~sqrtCAA + biochem + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f), I've 8 columns. It seems like when I used the interction term in the model.matrix, I got a column sqrtCAA:biochem which is the product of sqrtCAA x biochem. I included the two design matrices from astrocyte_cd31tx in the 2nd and 3rd tab of the excel file attached here.

As for the results when I put both variables in the model without the interaction term, I put them in the first tab in the excel file. There're no additional sqrtCAA:biochem columns if I don't include the interaction term.

Thank you for your help.

lhe17 commented 1 year ago

Hi AdelynTsai,

Thank you for your information.

Based on the summary statistics and information you shared, my interpretation is that biochem and sqrtCAA have a significant interaction effect on the gene expression. sqrtCAA modulates the effect of biochem. For example, biochem has a strong positive effect (i.e., higher biochem increases the expression) on HSPD1 in Ast when sqrtCAA is small, but this effect moves towards negative when sqrtCAA becomes large. In the model without the interaction term, the logFC of biochem gives a marginal effect of biochem. Because the positive and negative effects of biochem in the sqrtCAA_high and sqrtCAA_low groups cancel out if these two groups are considered together, the overall marginal effect of biochem is not significant.

Best regards,

Liang

On 3/3/2023 6:46 PM, AdelynTsai wrote:

Hi Liang, sqrtCAA and biochem are sample-level variables. Both of them are continuous variables.

Samples and cells differ by the biochem measures and cell type. For the example I gave, Astrocyte cd31_tx has 78 samples and 17722 cells. For microglia ab40_tbs, there are 78 samples and 18409 cells. In general, I've a range from 74~78 samples and 971 cells to 44151 cells among all the cell types I have.

For the design matrix, when I used model.matrix(~sqrtCAA*biochem + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f), there are 9 columns corresponding to the variables I gave in the model.matrix (I've 5 different batch_flowcell that makes it 4 different batch_flowcell columns in the design matrix). On the other hand, when I put both variables in the model without the interaction term, which is model.matrix(~sqrtCAA + biochem + Batch_Flowcell + Gender + Age_At_Death, data=meta.data.f), I've 8 columns. It seems like when I used the interction term in the model.matrix, I got a column sqrtCAA:biochem which is the product of sqrtCAA x biochem. I included the two design matrices in the 2nd and 3rd tab of the excel file attached here.

As for the results when I put both variables in the model without the interaction term, I put them in the first tab in the excel file. There're no additional sqrtCAA:biochem columns if I don't include the interaction term.

Thank you for your help.

Nebula_interaction_Q.xlsx https://github.com/lhe17/nebula/files/10884922/Nebula_interaction_Q.xlsx

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/16#issuecomment-1453881987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISURWQMFZ2LFCMONWZ6LW2IU7ZANCNFSM6AAAAAAULI4J2U. You are receiving this because you commented.Message ID: @.***>

lhe17 commented 1 year ago

Hi AdelynTsai, I think that the situation for NR4A3 is not very different from the previous one except that sqrtCAA now has a significant marginal effect as well. The interaction effect in the case of VCAM1 is the opposite. This can be illustrated in the following example. Marginally, there is no correlation between G (expression) and S (sqrtCAA) or B (biochem). However, for those with low S=0, G is anti-correlated with B (strong negative effect), and for those with high S=1, G is positively correlated with B (strong positive effect of the interaction term). G S B 1 0 -1 0 0 0 -1 0 1 -1 1 -1 0 1 0 1 1 1

Best regards, Liang

On Tue, Mar 7, 2023 at 12:26 AM AdelynTsai @.***> wrote:

Hi Liang, Thank you so much for your answer. That's really helpful. Upon looking more detailed into the results and following your logic of interpretation, I found some results I'm hard to interpret and I'm giving examples in the attached excel.

For NR4A3 from Fib x cd31, how can sqrtCAA and cd31 both have strong positive effect on its expression but the logFC of sqrtCAAxcd31 is strongly negative? For VCAM1 from Fib x cldn5, sqrtCAA has moderately positive effect and cldn5 has strong negative effect on its expression, but how does that turn into a logFC of sqrtCAAxcldn5 that is strongly positive?

Much appreciated for your help!

Nebula_interaction_question.xlsx https://github.com/lhe17/nebula/files/10903815/Nebula_interaction_question.xlsx

— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/16#issuecomment-1457205944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUTNXC42LFNBY5D5PGLW2ZXBNANCNFSM6AAAAAAULI4J2U . You are receiving this because you commented.Message ID: @.***>