zdk123 / SpiecEasi

Sparse InversE Covariance estimation for Ecological Association and Statistical Inference
GNU General Public License v3.0
191 stars 66 forks source link

Environmental Metadata #150

Open jlw-ecoevo opened 3 years ago

jlw-ecoevo commented 3 years ago

Hi! I was wondering if it is possible to incorporate environmental metadata into an analysis with spieceasi - it seems this would be quite important for getting an accurate network when dealing with datasets where we expect abioitic factors to be shaping the community (otherwise we might get spurious linkages between microbes that simply share a similar environmental preference). It seems like it would be unwise to CLR transform data that is not compositional (although maybe the math works out so that this is ok when estimating the precision matrix).

I can see how this would be easy to do using your multi-domain framework if there was an option to skip the clr transform on the second "domain" (i.e., the metadata). I could piece together a pipeline similar to yours that does this (using your clr function and other R packages for running glasso), but I have a number of collaborators/students who I think would benefit greatly from having this as an option in your much more user-friendly pipeline.

shu251 commented 3 years ago

Agreed with above, this feature would be incredibly useful!

zdk123 commented 3 years ago

I think this is a reasonably good idea, though I haven't done any benchmarking with (non-compositional) environmental covariates in the network. I would be interested in obtaining experimentally verified interactions to make sure it's handled correctly.

You may also be interested in our new method for removing latent covariates from networks, which could be from environmental or other sources (compositional, batch artifacts, etc). It's already included in this package: https://github.com/zdk123/SpiecEasi#learning-latent-variable-graphical-models

jlw-ecoevo commented 3 years ago

Agreed that its tough to validate many of these interactions for most networks since we don't have a good baseline set of confirmed interactions for most environments - but I think it is easy enough to simulate scenarios where exclusion of the environmental covariate leads to spurious interactions.

The latent variable approach is very nice - but in the case where we have an extensive set of environmental metadata (this is actually the norm for many of us) it seems like it would be better to directly incorporate these variables into the analysis than try and infer them indirectly?

zdk123 commented 3 years ago

@jlw-ecoevo Empirically, we found that many latent factors were not correlated to any metadata and, conversely, there was plenty of redundant covariates. Not sure which approach is better but we'd like to follow up on that.

lkoest12 commented 2 years ago

Hello everyone,

I was wondering if there has been any headway made here. I am deciding between SpiecEasi and CoNet and feel the inverse covariance approach seems a bit more strict towards direct interactions.

I have a dataset that consists of 16S rRNA gene amplicons from fecal and vaginal microbial communities that I would like to connect. Each animal has both fecal and vaginal samples, that were taken at the same time, so it hopefully this will reduce other confounding variables.

We could potentially use this dataset as a way to experimentally validate a new approach.

zdk123 commented 2 years ago

Yes I have some code for this now but it's not public yet. Please send me an email ;-)

olar785 commented 1 year ago

Hi @zdk123, Any update on this?

yang-nina commented 7 months ago

Hello, I wanted to also follow up on this. Would love to make use of this feature if available!

LuziaThea commented 5 months ago

I also want to follow up on this, I would be extremely interested in this feature :) Could you let us knot what is the current status @zdk123? Thank you so much!