zdk123 / SpiecEasi

Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
GNU General Public License v3.0
191 stars 67 forks source link

Issue with Sparse Network and Missing Nodes in Spiec-Easi #268

Closed 07Chengran closed 1 month ago

07Chengran commented 1 month ago

Dear Zach, Thank you very much for your packages!

I’m currently working on constructing a gut microbiota ecological network for a specific species using Spiec-Easi. I have 10 samples, and I retained only those taxa present in at least 5 samples, resulting in 109 taxa.

Here’s the problem I’m encountering: I experimented with different values for $lambda$ (0.001, 0.01, 0.1), set rep.num to 20, and used 100 values for nlambda. However, I ran into an issue when constructing the adjacency matrix.

Detailed Observations:

  1. When lambda.min.ratio was set to 0.1 and 0.01, the $select$start$summary for both scenarios looked similar to the following:

    [1] 0.03644444 0.07155556 0.09622222 0.13555556 0.19355556 0.19533333
    [7] 0.21355556 0.24022222 0.24955556 0.24955556 0.25866667 0.25866667
    [13] 0.25866667 0.25866667 0.25866667 0.26600000 0.27666667 0.28688889
    [19] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [25] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [31] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [37] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [43] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [49] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [55] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [61] 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000 0.29200000
    [67] 0.29200000 0.29200000 0.29200000 0.29266667 0.29466667 0.29666667
    [73] 0.29666667 0.29711111 0.29711111 0.29733333 0.30333333 0.31533333
    [79] 0.31955556 0.32577778 0.32622222 0.32622222 0.32622222 0.32755556
    [85] 0.33266667 0.33577778 0.33577778 0.33577778 0.33577778 0.33577778
    [91] 0.33577778 0.33911111 0.34488889 0.34488889 0.35066667 0.35933333
    [97] 0.37000000 0.37311111 0.37311111 0.37933333

    Despite these settings, the final adjacency matrix (spiec.gl.out$refit$stars) only contained 10 taxa. I suspect that the network might be overly sparse due to a large $lambda$, so I adjusted lambda.min.ratio accordingly.

  2. When lambda.min.ratio was set to 0.001, I received the following message:

   Optimal lambda may be larger than the supplied values
   $select$start$sumamary:
   [1] 0.04511111 0.09044444 0.10600000 0.20066667 0.22866667 0.25644444
   [7] 0.27666667 0.27666667 0.27666667 0.27666667 0.27666667 0.27666667
   [13] 0.27666667 0.27666667 0.27666667 0.27666667 0.27666667 0.27666667
   [19] 0.27666667 0.27666667 0.27666667 0.27666667 0.27666667 0.27666667
   [25] 0.28577778 0.29311111 0.31955556 0.32911111 0.33422222 0.34755556
   [31] 0.36933333 0.39044444 0.39044444 0.39044444 0.40311111 0.41133333
   [37] 0.41133333 0.41133333 0.41133333 0.41133333 0.41133333 0.43333333
   [43] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [49] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [55] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [61] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [67] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [73] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [79] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [85] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [91] 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889 0.43688889
   [97] 0.43688889 0.43688889 0.43688889 0.43688889

The summary results were still quite similar, but the problem persisted with only 10 taxa being retained in the adjacency matrix.

  1. I also experimented with changing the thresh value to 0.1 with lambda.min.ratio set to 0.01. While the $select$start$summary appeared as:
    [1] 0.03800000 0.04533333 0.13444444 0.14600000 0.20222222 0.23400000
    [7] 0.25066667 0.26288889 0.26644444 0.27133333 0.27155556 0.27155556
    [13] 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556
    [19] 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556
    [25] 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556
    [31] 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556 0.27155556
    [37] 0.27422222 0.28966667 0.28966667 0.28966667 0.29177778 0.30044444
    [43] 0.30044444 0.30466667 0.31844444 0.32622222 0.32833333 0.32833333
    [49] 0.35066667 0.35911111 0.36177778 0.37800000 0.38222222 0.39044444
    [55] 0.39666667 0.39755556 0.40422222 0.40422222 0.40422222 0.40422222
    [61] 0.40422222 0.40422222 0.40422222 0.41311111 0.41466667 0.41466667
    [67] 0.42177778 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889
    [73] 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889
    [79] 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889
    [85] 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889
    [91] 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889 0.42888889
    [97] 0.42888889 0.42888889 0.42888889 0.42888889

    The final adjacency matrix still contained only 10 taxa.

My Question:

What could be causing this significant reduction in the number of taxa in the adjacency matrix? Is it possible that the data itself is problematic, or is there something inherent in the Spiec-Easi model parameters that might be leading to this loss of nodes? Could the choice of $lambda$ or lambda.min.ratio be influencing this, and if so, how should I adjust them to retain more taxa in the network?

Any insights or suggestions would be greatly appreciated.

Thank you for your assistance.

zdk123 commented 1 month ago

It sounds like the otu matrix needs to be transposed. SPIEC-EASI assumes a samples (rows) by features (columns) format, which is more standard in the statistics field

On Sat, Aug 10, 2024, 9:33 AM 刘澄蔚然 @.***> wrote:

Closed #268 https://github.com/zdk123/SpiecEasi/issues/268 as completed.

— Reply to this email directly, view it on GitHub https://github.com/zdk123/SpiecEasi/issues/268#event-13827970038, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUD2RASOAHA6NXVQW2HU7LZQYQFDAVCNFSM6AAAAABMJO5JM6VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJTHAZDOOJXGAYDGOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

07Chengran commented 1 month ago

Thank you very much for your reply!I have transposed the matrix and the results seem well.

07Chengran commented 1 month ago

Dear Zach,

I'm really sorry that I met some problems again. I'm currently constructing a cross-domain network using metagenomic sequencing data that includes bacteria and fungi. I've encountered a couple of issues and would greatly appreciate your assistance.

  1. Covariance Values: I've observed that the covariance values in my data are consistently very small, ranging from (10^{-11}) to (10^{-3}). Here’s the code I’m using with Spiec-Easi:

    spiec.gl.out = spiec.easi(
      list(otus.f.bac, otus.vir), 
      method = "glasso",
      icov.select.params = list(rep.num = 20),
      lambda.min.ratio = 0.01,
      nlambda = 100,
      pulsar.params = list(thresh = 0.1)
    )

    Is it normal for covariance values to be this small? If not, what might be causing this issue?

  2. Warning Message: Additionally, I receive the following warning message:

    Warning message:
    In spiec.easi.list(list(otus.f.bac, otus.vir), method = "glasso",  :
      input list contains data of mixed classes.

    Could you help me understand what this warning means and how I can resolve it?

Thank you very much for your assistance!