stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
150 stars 27 forks source link

Importance of good ARI values #78

Closed noelzach closed 1 year ago

noelzach commented 1 year ago

Hi there, wonderful package! Thank you for developing it. This may be more of a question rather than an issue...

I am comparing some cross-domain networks (fungal ITS and bacteria 16S) from three groups (1, 2, and 3). I've followed the tutorial on how to construct the cross-domain networks with SPIEC-EASI, and input the association matrices into Netcomi for comparison. However, when I do the ARI values are usually very low for the comparison of some groups. Sometimes it says the pvalue is significant, sometimes it's not depending on which groups are being compared (group 1 vs 2 = 0.009, pvalue = 0.4; group 2 vs 3 = 0.02, pvalue 0.04; group 1 vs 3 = 0.02, pvalue = 0.01). From my interpretation if the pvalue for the ARI is not different then zero then the two networks are essentially random, meaning we cannot trust the results? Am I correct in this interpretation? If I'm not correct in this interpretation, can I have a more concrete explanation for how to interpret the ARI value in the context of comparing two networks.

Thanks!

stefpeschel commented 1 year ago

Hey,

I already answered you via email but would also put my answer here in case it is interesting for others as well:

The ARI relates to the clusters only. An ARI close to zero means that the cluster assignments are similar to random clusterings or, in other words, the nodes could have been randomly assigned to the clusters so that the clusterings are not very similar in the two networks. And the higher the ARI, the more similar are the two clusterings.

The p-values correspond to the null hypothesis ARI = 0 and a significant p-value means that the ARI is significantly different from zero. You can get significant p-values even for small ARI values. This happens for large networks with many possible cluster assignments. For large networks, it is very unlikely to randomly assign nodes to the same cluster in both groups. So, even a few similarities between the two clusterings lead to an ARI that is significantly different from zero.

noelzach commented 1 year ago

That makes way more sense in my context. Thank you! I think I had the wrong interpretation.