stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
146 stars 26 forks source link

Understanding diffnet() output #74

Closed TimFaro closed 1 year ago

TimFaro commented 1 year ago

Hi Stefanie,

quick question about the diffnet() output. I'm trying to get a data frame that contains the information that is visualized in the network with the plot() function (i.e., the relationship between every OTU). When looking at the diffnet() output used with the Fisher's z-test, I'm assuming diffAdjustMat is the frame I'm interested in? But I'm not sure how to interpret the values it contains, they range from 0 to ~1.8.

Thanks!

Best

Tim

stefpeschel commented 1 year ago

Hi Tim,

Thanks for your question. Yes, you are right, depending on whether the plot argument adjusted is set to TRUE or not, the diffAdjustMat or diffMat is used as edge weight.

Since the absolute differences of associations are plotted, the edge weights range from 0 (if associations are exactly equal) and 2 (if the association is -1 in one group and 1 in the other group). So, the edge weights are not in [0,1] as in the association networks, but in [0,2].

Best, Stefanie

TimFaro commented 1 year ago

Hi Stefanie,

great, thanks for the explanation! What would be the interval for the association of 1 in one group and -1 in the other, i.e. the „opposite“ of edge weight 2? Because in the plot, the directions are represented by different colors.

Best

Tim

stefpeschel commented 1 year ago

It's still 2 because the difference is in absolute terms :)

TimFaro commented 1 year ago

Okay but how is the "direction" determined? I have for example in my network three OTUs: OTU_128, OTU_230 and OUT_279. OTU_128 - OTU_230 are connected by a cyan edge, so Disease + and Control - OTU_128 - OTU_279 are connected by a purple edge, so Disease - and Control + But both connections have almost the same value: 0.837 and 0.888

stefpeschel commented 1 year ago

Okay, now I know what you mean.

The diffnet function also returns the two association matrices (assoMat1 and assoMat2) and within the plot function, it is just checked whether the respective associations are positive or negative. More precisely, if both associations are positive, the edge is colored green by default, if the association is positive in the first group and negative in the second, the edge is blue and so on.

To summarize, the diffMat and diffAdjustMat objects just give the edge weights and the colors depend on the estimated associations itself.

You could also just take the difference of assoMat1 and assoMat2 to get the signed differences.

TimFaro commented 1 year ago

Ah okay great, that explains everything, thanks!

TimFaro commented 1 year ago

Sorry to reopen this, but I have one more question, how are cases like this handled: In control, correlation between OTU1 and OTU2 is 0.1 In disease, correlation between OTU1 and OTU2 is 0.8 This would give a green edge (both are positive) with an edge weight/association value of 0.7 (|0.1 - 0.8| = 0.7)

And now for example: In control, correlation between OTU1 and OTU2 is 0.8 In disease, correlation between OTU1 and OTU2 is 0.1 This would give a green edge with an edge weight/association value of 0.7 (|0.8 - 0.1| = 0.7) as well?

So these edges would be exactly the same, even though they are the opposite in regards of how the disease influences the relationship of the OTUs (in the first case the disease leads to a stronger association, in the second case the opposite)?

stefpeschel commented 1 year ago

Hey Tim,

Yes, you're right that both cases would lead to a green edge. One should consider that the purpose of a differential network is to visualize, whether and to what extent the associations differ between the two groups. Thus, a differential network generally contains less information than the two association networks themselves.

When writing the function, I could think of two main options for using the edge coloring: Option 1 with three cases:

Option 2 with nine cases:

If none of the associations are equal to zero, the "=0" cases are not used.

So, I decided to go with the second option, which is more informative than the first one. And a second (and actually more important) reason is, that the Discordant method (an alternative to Fisher's z-test) is based on a mixture model with exactly these nine cases.

If one is interested in the strength of the respective associations, one should present the two association networks together with the differential network. I would suggest plotting the two association networks with the same layout to see differences at first glance.

I hope my explanations help to understand the idea behind the coloring. But I would also be open to suggestions on how to improve the function.

Best, Stefanie

TimFaro commented 1 year ago

Hi Stefanie,

thanks for the detailed explanation! The tipp of plotting both individual networks together with the differential network is is great, I will do that.

In my use case (benchmarking with other tools), I'm more interested in how the associations change between condition, so I will try to create 8 classes myself by extracting the information from the tables manually and creating the output that way. The classes would be almost the same as the ones used by NetCoMi, except I would not handle the 0 case individually but count it as either + or -, depending on the second association value. Then the 8 will be:

asso1 >= 0, asso2 >= 0, asso1 > asso2 asso1 >= 0, asso2 >= 0, asso1 < asso2 asso1 <= 0, asso2 <= 0, asso1 < asso2 asso1 <= 0, asso2 <= 0, asso1 > asso2 asso1 <= 0, asso2 >= 0, asso1 < asso2 asso1 >= 0, asso2 <= 0, asso1 > asso2 asso1 >= 0, asso2 >= 0, asso1 = asso2 asso1 <= 0, asso2 <= 0, asso1 = asso2

Best

Tim

stefpeschel commented 1 year ago

That's an interesting assignment. I am just wondering if you really have cases where both associations are exactly equal. Or do you use some threshold so that they are seen as equal if the difference is smaller than the threshold?

TimFaro commented 1 year ago

Yes exactly, I assume they are equal with a tolerance of 0.01 (e.g. 0.50 and 0.51). Although this might be subject to change, depending on how well it works.