stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
143 stars 24 forks source link

In a few cases, class assignment by DISCORDANT seems to contradict the associations from the two input networks #85

Closed antonkratz closed 1 year ago

antonkratz commented 1 year ago

My workflow involves SPRING followed by DISCORDANT to identify differentially associated OTUs in a network measured in two states. I observe a few edges where the class assignment by DISCORDANT seems to contradict the associations from the two input networks. That is, I have an OTU which is completely disconnected in state A. The same OTU has connections to other OTUs in state B, and these connections all happen to be negative. My expectation is that in the resulting differential network, the class should always by 4, i.e. from 0 to -. Yet, I sometimes observe class 7 as well, from 0 to +. For example:

In the resulting diff. network, the edge between OTU1 and OTU2 gets assigned class 7 (0 to +) in the classMat, which I think is a contradiction, shouldn't it be rather class 4 (0 to -)?

For a different connection (OTU1 to OTU3, not existent in state A, with a score of -0.21 in state B) I observe class 4 (0 to -) as expected.

Shouldn't "from zero to some negative score" (if it makes it into the diff. network) always get class 4 (0 to -), and not ever class 7 (0 to +)?

stefpeschel commented 1 year ago

Hey Anton,

The class assignment should be as you describe: If the association is close to zero in group A and negative in group B, it should be assigned to class 4. It could also fall into class 0 if the association in class B is negative but close to zero.

Could you please provide me with a working example so I can take a look at this issue?

I've tested the function with the datasets used in my GitHub tutorials and found no inconsistencies.

Best, Stefanie

antonkratz commented 1 year ago

Could you please provide me with a working example so I can take a look at this issue?

Hi Stephanie, I am unable to share the underlying data. However here is the code as minimal as I could make it:

library("NetCoMi")

# Rows: OTUs. Columns: Samples.
df_stateA <- read.table("stateA.tsv", header = T)
df_stateB <- read.table("stateB.tsv", header = T)

# I need to transpose rows with columns for netConstruct
stateA <- t(df_stateA)
stateB <- t(df_stateB)

# Filter the 58 samples (sample size of the smaller group) with highest
# frequency to make the sample sizes equal and thus ensure comparability. 
n_yes = 58

# SPRING
duo_net <- netConstruct(data = stateA,
    data2 = stateB,
    filtTax = "highestFreq",
    filtTaxPar = list(highestFreq = n_yes),
    filtSamp = "totalReads",
    filtSampPar = list(totalReads = 1000),
    measure = "spring",
    measurePar = list(nlambda=10, rep.num=10),
    normMethod = "mclr", 
    zeroMethod = "none",
    sparsMethod = "none", 
    dissFunc = "signed",
    verbose = 2,
    seed = 123456)

duo_diff <- diffnet(duo_net,
                       diffMethod = "discordant",
               discordThresh = .9, 
                       adjust = "lfdr")
antonkratz commented 1 year ago

P.S.: I will also try to make synthetic data to reproduce the error that I am definitely observing but that can take a a while as it unclear to me right now how to go about constructing such a data set.

stefpeschel commented 1 year ago

Hey Anton,

I tested the function with a dataset I have at hand and the class assignments are indeed wrong.

Since I couldn't find any inconsistencies in my code, I tried the examples in the discordant package and the same problem occurs there. So, I'll write an email to the package owners.

There were already some problems with the discordant package when I first used it. So, I only used the discordantRun function and replaced the other functions with my own corrected versions. However, the class assignments were correct at that time.

I saw that the latest discordant version was uploaded a few days ago, but updating the package didn't change anything. So, let's see what the authors say...

Best, Stefanie

antonkratz commented 1 year ago

Hey Anton,

I tested the function with a dataset I have at hand and the class assignments are indeed wrong.

Since I couldn't find any inconsistencies in my code, I tried the examples in the discordant package and the same problem occurs there. So, I'll write an email to the package owners.

There were already some problems with the discordant package when I first used it. So, I only used the discordantRun function and replaced the other functions with my own corrected versions. However, the class assignments were correct at that time.

I saw that the latest discordant version was uploaded a few days ago, but updating the package didn't change anything. So, let's see what the authors say...

Thank you very much for looking into this! I am happy you see this error too. Yes, looking forward to input from the DISCORDANT team. I think it is worthwhile to fix this!

stefpeschel commented 1 year ago

Hey Anton,

While preparing some examples for the discordant authors, I had a closer look at what happens inside the functions and how the class assignments work. I recognized that the problem is not the implementation but the estimated parameters of the mixture model. In other words: The discordant implementation is correct but the estimated parameters lead to unintuitive class assignments in some cases, as you will see in my examples.

Here you'll find a script with two examples. The first one is an example from the discordant help page, where the assignments are correct. The crucial point is the estimated parameters of the mixture model. In the discordant example, the means of the negative mixture components are strongly negative, that of the zero component are close to zero, and that of the positive component are strongly positive. For the differential network, however, the means of the negative components are positive leading to contradictory class assignments. The fact that the negative correlations are only slightly negative may be the reason for the positive mean values.

These examples also imply that the class matrix defined in the discordant manuscript does not match the output of the function: The rows represent the components of group 2, not group 1 as stated in the paper. That's why we defined the matrix differently in the supplement of the NetCoMi paper.

Considering the estimated parameters, the class assignments make sense from my point of view, even if they are not intuitive. Hope you agree.

Best, Stefanie