zdk123 / SpiecEasi

Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
GNU General Public License v3.0
194 stars 68 forks source link

Lower bound accuracy, edge weights, and node sizes #59

Closed ChrisMcMullen closed 6 years ago

ChrisMcMullen commented 6 years ago

Hi,

I have a few issues I was hoping someone could help me with.

First Issue

Regarding the spiec.easi function, I used the following code:

networkBRD <- spiec.easi(ps1pfBRD, method="glasso", lambda.min.ratio=1e-2, nlambda=10, sel.criterion="bstars", pulsar.select=TRUE, pulsar.params=list(rep.num=20, ncores=56))

and received the following warning:

Warning message: In pulsar(data = X, fun = match.fun(estFun), fargs = args, rep.num = 20, : Accurate lower bound could not be determined with the first 2 subsamples

Is this something I need to be concerned with? Please note that I am analyzing a fairly large 16S data set.

Second Issue

As per the online tutorial, I adjusted my matrix to an igraph object for visualization. I want to either cluster or search for communities in my data. Most methods I have found require an edge weight. How can I obtain edge weights and use them in the adj2igraph function? In another Issue I noticed that you indicated the getOptCov function can be used to assigned the colors of the edges based on +/- relationship. Are the co-variances equivalent to my edge weights? Or do I need to use the cov2cor function? Can I simply insert the resulting matrix into the adj2igraph function in edge.attr=list() to add edge weights?

Conversely, if you know of a better way to cluster my data that doesn't require edge weights, I can pursue that option.

Third Issue

In your benchmarking paper, you indicate in one of your figures that you scale the node sizes of your visualizations based on the geometric means of the relative abundances. How did you accomplish this? Can you do this using the plot_network phyloseq wrapper?

Thank you in advance for any clarifications and suggestions you may provide.

zdk123 commented 6 years ago
  1. I would try lowering the value for lambda.min.ratio. The warning likely means that you are not exploring the whole lambda path. Since you are using the bounded StARS method, we need to be able to compute a variability statistic that's even more sensitive to noise than for "complete" StARS. The good news is that once that lower bound is found, we can cut off the low/slow end of the lambda path for the rest of subsamples, ultimately saving compute time (probably)

  2. I would indeed recommend using cov2cor to scale your weights. Another option is to use the output of getOptMerge. This is the basis of the StARS variability statistic, giving you something like a confidence score for a particular edge.

  3. To compute the geometric mean try something like exp(colMeans(log(X))), where X is the samples by OTUs relative abundance matrix (with a pseudocount or filter zeros). I don't know the plot_network API off the top of my head, but I'm sure there's an option for scaling the vertex size.

ChrisMcMullen commented 6 years ago

Thank you for your response.

Is there a sensible way to detect communities in the network returned by Spiec Easi? I was thinking of doing a graphical analysis using an igraph object, but there may be an easier way that you have tried before.

zdk123 commented 6 years ago

I would typically use igraph for that kind of downstream analysis

ChrisMcMullen commented 6 years ago

Makes sense to me. Thank you for answering all of my questions.