igraph backbone wrapper

zpneal / backbone

R backbone package - Extract the backbone from weighted and unweighted networks

https://www.rbackbone.net

40 stars 8 forks source link

igraph backbone wrapper #25

Closed Gian77 closed 3 years ago

Gian77 commented 3 years ago

Hello Rachel,

do you have wrapper function I can use to extract the backbone form an igraph object?

For example form this network:

> net_pop
IGRAPH 2942f33 UNW- 220 664 -- 
+ attr: name (v/c), weight (e/n)
+ edges from 2942f33 (vertex names):
 [1] FOTU_6   --FOTU_32   FOTU_6   --FOTU_71   FOTU_6   --FOTU_511  FOTU_529 --FOTU_261  FOTU_54  --FOTU_40  
 [6] FOTU_54  --FOTU_8    FOTU_54  --FOTU_889  FOTU_54  --FOTU_138  FOTU_54  --FOTU_191  FOTU_54  --FOTU_170 
[11] FOTU_54  --FOTU_337  FOTU_54  --FOTU_358  FOTU_54  --POTU_9    FOTU_54  --POTU_93   FOTU_54  --POTU_128 
[16] FOTU_2542--FOTU_1646 FOTU_2542--FOTU_121  FOTU_2542--POTU_9    FOTU_2542--POTU_128  FOTU_1   --FOTU_61  
[21] FOTU_1   --POTU_40   FOTU_1   --POTU_5    FOTU_1   --POTU_16   FOTU_1   --POTU_38   FOTU_14  --FOTU_247 
[26] FOTU_14  --FOTU_102  FOTU_14  --POTU_39   FOTU_14  --POTU_97   FOTU_32  --FOTU_1646 FOTU_32  --FOTU_56  
[31] FOTU_32  --POTU_35   FOTU_32  --POTU_24   FOTU_32  --POTU_12   FOTU_46  --FOTU_125  FOTU_46  --FOTU_263 
[36] FOTU_46  --FOTU_249  FOTU_46  --POTU_16   FOTU_46  --POTU_35   FOTU_439 --FOTU_84   FOTU_17  --FOTU_24  
+ ... omitted several edges

Thanks a lot in advance,

Go green!

Gian

zpneal commented 3 years ago

Hi Gian,

From your output, it looks like your igraph object is a weighted graph (since the first line shows "UNW"). If it is the product of a bipartite projection, then you can use the current version of backbone, but would need to use the original bipartite network (in igraph the code would be "UN-B"). If it is not the product of a bipartite projection (i.e. the edge weights were directly measured somehow), then the current version of backbone won't work...but we're working on this for v2 later this year.

Go white!

Best, Zak

––– Zachary Neal, PhD Associate Professor, Michigan State University Web: http://www.zacharyneal.com Twitter: @zpnealhttps://twitter.com/zpneal Zoom: Click herehttps://msu.zoom.us/j/2913501222

On Feb 17, 2021, at 5:06 PM, GMNB notifications@github.com<mailto:notifications@github.com> wrote:

Hello Rachel,

do you have wrapper function I can use to extract the backbone form an igraph object?

For example form this network:

net_pop IGRAPH 2942f33 UNW- 220 664 --

attr: name (v/c), weight (e/n)

edges from 2942f33 (vertex names): [1] FOTU_6 --FOTU_32 FOTU_6 --FOTU_71 FOTU_6 --FOTU_511 FOTU_529 --FOTU_261 FOTU_54 --FOTU_40 [6] FOTU_54 --FOTU_8 FOTU_54 --FOTU_889 FOTU_54 --FOTU_138 FOTU_54 --FOTU_191 FOTU_54 --FOTU_170 [11] FOTU_54 --FOTU_337 FOTU_54 --FOTU_358 FOTU_54 --POTU_9 FOTU_54 --POTU_93 FOTU_54 --POTU_128 [16] FOTU_2542--FOTU_1646 FOTU_2542--FOTU_121 FOTU_2542--POTU_9 FOTU_2542--POTU_128 FOTU_1 --FOTU_61 [21] FOTU_1 --POTU_40 FOTU_1 --POTU_5 FOTU_1 --POTU_16 FOTU_1 --POTU_38 FOTU_14 --FOTU_247 [26] FOTU_14 --FOTU_102 FOTU_14 --POTU_39 FOTU_14 --POTU_97 FOTU_32 --FOTU_1646 FOTU_32 --FOTU_56 [31] FOTU_32 --POTU_35 FOTU_32 --POTU_24 FOTU_32 --POTU_12 FOTU_46 --FOTU_125 FOTU_46 --FOTU_263 [36] FOTU_46 --FOTU_249 FOTU_46 --POTU_16 FOTU_46 --POTU_35 FOTU_439 --FOTU_84 FOTU_17 --FOTU_24

... omitted several edges

Thanks a lot in advance,

Go green!

Gian

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/domagal9/backbone/issues/25__;!!HXCxUKc!lurz4XIyla7f8prpMz3dApS2t6FiAkPj5fA_ykCofJWNq9mxBWJsWtnomiN33Q$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AQNXJBTO52KPSUETMJCE7HTS7Q4UTANCNFSM4XZE3RIQ__;!!HXCxUKc!lurz4XIyla7f8prpMz3dApS2t6FiAkPj5fA_ykCofJWNq9mxBWJsWtnb5wCs5A$.

Gian77 commented 3 years ago

Hello @zpneal,

Sorry for this late reply. Yes, the weights were added after the network was calculated. It is a biological networks generated with the R package SpiecEasi. For dense networks is hard to find a good strategy to how filter edges becasue how these edges are calculated - it is not as straightforward as a correlation value. I am happy to provide data if you need/are interested to test the next version of your package. Having the possibility to get the backbone out of these networks would be super useful.

Thanks a lot,

zpneal commented 3 years ago

Hi Gian,

Thanks for the additional detail. Knowing a bit more about your data, I think you have a couple different options for extracting a backbone:

(1) I worked with David Dillon to apply the SDSM backbone to data on operational taxonomic units in zebrafish that, based on what SpiecEasi is usually used for, sounds like it might be similar to your data (https://www.sciencedirect.com/science/article/abs/pii/S0269749119345737). If you're able to organize your raw data (i.e. the data from which the biological networks are generated) into a presence/absence matrix, then you'd be able to extract the backbone using the existing backbone package. This data might take the form, for example, of a binary matrix with OTMs as rows, and species or organisms as columns.

(2) I think SpiecEasi estimates biological networks using a variety of Gaussian Graphical Model. Although I'm not sure about this package in particular, usually GGM estimation allows the specification of a significance threshold for the edges. You could set a more stringent threshold when estimating the network using SpiecEasi, which should also reduce the density of your resulting network.

(3) Finally, if neither of those work, you might consider a backbone method that is designed for weighted networks, such as the Disparity Filter: https://www.pnas.org/content/106/16/6483.short

Hopefully that helps, Zak

Gian77 commented 3 years ago

Thanks a lot @zpneal,

My input data to spiec.easi() consists of a list of two data frames (one is the fungal communities = fungi_core, and one is the bacterial communities = prok_core). This because from understanding the algorithm in the function does its own normalization between the two datasets that comes form different sequencing runs. So, I think it may better than merging the data outside the function. The two data frames have rows as species (OTUs) and columns as samples. I can generate binary data frames for the original easily - I will look into backbone package more deeply and see if I find a way to do what you suggested. I will let you know If I have trouble understanding.

Regarding a significance threshold for the edges I am not sure. The edges are calculated by the symBeta function (please see below) and I am not sure what it does in details. I went over the spiec.easi paper million times and it is not well explained, or, it is probably me - I would have benefited some more pure math during my PhD.

Just if you are curious. This is how I run it

param <- list(rep.num=100, seed=10010, ncores=12, thresh=0.05)
SpiecEasi_object <- SpiecEasi::spiec.easi(list(fungi_core, prok_core),
                                     method="mb",
                                     lambda.min.ratio=1e-1, 
                                     nlambda=100, 
                                     sel.criterion ="stars",
                                     pulsar.select = TRUE,
                                     pulsar.params=param)

The result is this is this object below

> SpiecEasi_object
Pulsar-selected refit of sparseiCov 
Path length: 100 
Graph dim:   220 
Criterion:
  stars... sparsity 0.0274

To which I attach edges weight and I generate an igraph network this way

GetNetwork <- function(physeq_f, physeq_b, SpiecEasi_object){
  names_taxa <- c(taxa_names(physeq_f), taxa_names(physeq_b))
  network <- adj2igraph(Matrix::drop0(getRefit(SpiecEasi_object)),
                        vertex.attr = list(name=names_taxa),
                        rmEmptyNodes = FALSE)
  symbeta_net = symBeta(getOptBeta(SpiecEasi_object), mode='maxabs')
  weight <- Matrix::summary(t(symbeta_net))[,3]
  E(network)$weight <- weight
  return(network)
}

Thanks much!

Gian

zpneal commented 3 years ago

This is just a guess, since I'm not familiar with library(spiec.easi) or the models it uses, but I suspect the thresh=0.05 argument is setting a significance threshold. You could try repeating your code with something like thresh=0.01 to see if this yields a sparser network. But, again, that's just a guess.

If you're able to create a binary OTU-by-Sample matrix, then library(backbone) would be able to handle this with no problem. If the OTUs are rows, then the backbone will be an OTU network in which two OTUs are connected if they are both expressed in a significant number of samples, where "significant" depends on the type of backbone model. For example, if your binary matrix is called B, then could run something like: library(backbone) sdsm <- sdsm(B) backbone <- backbone.extract(sdsm)

Gian77 commented 3 years ago

Wow, that was fast. Thanks. I tried your suggestion and I got indeed a sparser network. This is the original

> spiec_net
Pulsar-selected refit of sparseiCov 
Path length: 100 
Graph dim:   220 
Criterion:
  stars... sparsity 0.0274

This is with thresh=0.01 indeed I get a sparser network.

> spiec_net_new
Pulsar-selected refit of sparseiCov 
Path length: 100 
Graph dim:   220 
Criterion:
  stars... sparsity 0.00554

spiec.easi network have a stability parameter that, according the Authors, has to be close to 0.05 - potentially related to that thresh parameter? If I look at the stability of the original network I get this value for the original network

> getStability(spiec_net) [1] 0.04813465

And this value when using thresh = 0.01

> getStability(spiec_net_new) [1] 0.009547763

which indeed is pretty close to 0.01. The original network has 356 edges

IGRAPH 4d5954b UNW- 220 356 --

the new network has 134

IGRAPH ddb5b6a UNW- 220 134 --

I generated the binary otu_table

> dim(binary_core) [1] 220 119

> head(binary_core)
          SAM147 SAM148 SAM149 SAM150 SAM151 SAM152 SAM153 SAM154 SAM155 SAM156 SAM157 SAM158 SAM159 SAM160 SAM161 SAM162 SAM163 SAM164 SAM165 SAM166 SAM167 SAM168
FOTU_6         0      1      1      1      1      1      1      0      0      1      1      1      1      1      1      1      0      1      0      1      1      1
FOTU_529       0      0      0      0      1      1      0      0      0      0      0      0      1      1      0      0      0      0      0      0      1      0
FOTU_54        0      0      0      0      1      1      0      0      0      0      0      1      1      1      0      0      0      0      0      0      1      1
FOTU_2542      0      1      1      1      1      1      1      1      0      1      1      1      1      1      1      1      0      1      1      1      0      1
FOTU_1         0      1      1      0      1      1      1      1      0      1      1      1      1      1      1      1      0      1      1      1      1      1
FOTU_14        0      0      0      0      0      0      0      0      0      1      0      0      1      0      0      0      0      0      0      0      0      0

Then I run the extract.backbone function

> backbone_core_Pop_10 <-
+   backbone.extract(sdsm(as.matrix(binary_core_Pop_10)), 
+                    alpha = 0.05)
Finding the distribution using SDSM with BiCM probabilities.
> head(backbone_core_Pop_10)
          FOTU_6 FOTU_529 FOTU_54 FOTU_2542 FOTU_1 FOTU_14 FOTU_32 FOTU_46 FOTU_439 FOTU_17 FOTU_40 FOTU_7 FOTU_24 FOTU_5 FOTU_8 FOTU_87 FOTU_27 FOTU_20 FOTU_164 FOTU_9
FOTU_6         0        0       0         0      0       0       0       0        0       0       0      0       0      0      0       0       0       0        0      0
FOTU_529       0        0       0         0      0       0       0       0        0       0       0      0       0      0      0      -1       0       0        0      0
FOTU_54        0        0       0         0      0       0       0       0        0       0       1      0       0      0      0      -1       0       0        0      0
FOTU_2542      0        0       0         0      0       0       0       0        0       0       0      0       0      0      0       0       0       0        0      0
FOTU_1         0        0       0         0      0       0       0       0        0       0       0      0       0      0      0       0       0       0        0      0
FOTU_14        0        0       0         0      0       0       0       0       -1       0       0      0       0     -1      0       0       0       0        0      0
> dim(backbone_core)
[1] 220 220

Now what do I do with this matrix? Are those with -1 the edges I will have to remove?

Thanks a lot,

Gian

zpneal commented 3 years ago

I did see the spiec.easi authors' suggestions for specific thresh and stability values, but I didn't understand where their recommendations were coming from. It does look like decreasing thresh does indeed make the model more stringent, keeping fewer edges, so that approach might work.

The library(backbone) models by default extracts a signed backbone where the 1s indicate positive edges (in your case, OTUs co-expressed in significantly MORE samples than expected at random) and the -1s indicate negative edges (co-expressed in significantly FEWER samples). You can ignore the negative edges, or use backbone.extract(signed = FALSE) to simply not detect negative edges in the first place. Otherwise, the output matrix is a binary adjacency matrix that represents the backbone network, so it's ready to visualize/analyze as any binary network.

Gian77 commented 3 years ago

Got it, thanks a lot. I will play with it a little more and see how different the two network approaches are.

Gian