stefpeschel / NetCoMi

Network construction, analysis, and comparison for microbial compositional data
GNU General Public License v3.0
143 stars 24 forks source link

SPRING network with no edges - "The data does not contain zeros" warning #95

Closed sayalaruano closed 9 months ago

sayalaruano commented 9 months ago

Hello! I'm trying to create a SPRING network with my metagenomics data, but after running the algorithm it says that the network has no edges.

I really want to know why I'm getting this error and how I could solve it.

This is the reference code:

` Load data bact_BR <- read.csv("../Data/Bacteria/BukitOilpalm_bacteria.csv", header = TRUE, row.names = 1)

Transpose the matrix bact_BR_t <- t(bact_BR)

Create network net_spring3 <- netConstruct(bact_BR_t, filtTax = "highestFreq", filtTaxPar = list(highestFreq = 50), filtSamp = "totalReads", filtSampPar = list(totalReads = 100), measure = "spring", measurePar = list(nlambda = 10, rep.num = 10), normMethod = "none", zeroMethod = "none", sparsMethod = "none", dissFunc = "signed", verbose = 3, seed = 123)`

This is the output:

`Checking input arguments ... Done. Data filtering ... 2638 taxa removed. 50 taxa and 4 samples remaining.

Calculate 'spring' associations ... The data does not contain zeros. Consider changing the type to "continuous". The data does not contain zeros. Consider changing the type to "continuous". The input is identified as the covariance matrix. Conducting Meinshausen & Buhlmann graph estimation (mb)....done The data does not contain zeros. Consider changing the type to "continuous". The input is identified as the covariance matrix. Conducting Meinshausen & Buhlmann graph estimation (mb)....done

Done. Network has no edges. Warning messages: 1: In Matrix::nearPD(R, corr = TRUE) : 'nearPD()' did not converge in 100 iterations 2: 11 jobs had warning: "'nearPD()' did not converge in 100 iterations" 3: In pulsar::pulsar(qdat, fun = fun, fargs = list(lambda = lambdaseq, : Optimal lambda may be larger than the supplied values `

I'm working with a subset of my original dataset for a specific condition, in which I have 4560 OTUs and 4 samples. Could be that the algorithm cannot handle a dataset with many OTUs and a few samples?

Thanks in advance for your help.

muellsen commented 9 months ago

By quickly looking at the output, it seems that the filtering procedure you apply removes most of the samples. This potentially indicates that your dataset is even sparser (many more zeros) than most 16S data sets. Given that only four samples are left, it is statistically very likely that you cannot detect any edges (partial correlations) faithfully between your 50 taxa. @stefpeschel please chime in here when you have time!

stefpeschel commented 9 months ago

If I'm understanding you correctly, your dataset had only four samples even before NetCoMi's internal filtering. We would generally recommend using higher sample sizes, because you won't get reliable results with only four samples.

The output you get is not an error, but says that the estimated network has no edges / all entries in the adjacency matrix are zero. The SPRING approach includes StARS model selection, which results in a sparsity level that is stable under random subsampling of the data. It seems that with only four samples you don't get any stable edges.

sayalaruano commented 9 months ago

Hi @muellsen and @stefpeschel. Thank you very much for your quick answer. Great! Now, I understand why the network ended up with no edges. Is there a recommended number of samples to create the association networks?

stefpeschel commented 9 months ago

Kurtz et al. 2015 give a rule of thumb that the relationship between the sample size n, the number of OTUs p, and the maximum node degree d of the network is n=O(d²log(p)). However, as their simulation studies show, the performance of edge recovery depends on the estimation method and the structure of the network (see Figure 4 in the paper). Depending on the network structure, there can still be a large difference in performance for sample sizes above 100. Unfortunately, there is no general rule to get reliable results.

Reference: Kurtz, Z. D., Müller, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., & Bonneau, R. A. (2015). Sparse and Compositionally Robust Inference of Microbial Ecological Networks. PLoS Computational Biology, 11(5), 1–25. https://doi.org/10.1371/journal.pcbi.1004226

sayalaruano commented 9 months ago

I see, thank you very much for the detailed answer @stefpeschel. Now, I'm using NetCoMi for a project of a NetBio course about microbial association networks from metagenomics samples of the rainforest and deforested areas in the Colombian Amazon. I would say that this package has been a live savior, it has many options and everything is well-explained and easy to implement. Thanks a lot for the effort to create it. I hope to contribute to this project in the future, it would be nice to have a CONTRIBUTING.md file to know how potential contributors can help.