How to calculate the effective distance between nodes in a graph?

joheli commented 4 years ago

Folks I would like to understand how to correctly apply the function eff_dist of package NetOrigin and there is one step I am not sure of (see below the paragraph in bold).

First, let's create a few nodes with random edges and weights:

# set seed using R version 4.0.2 (2020-06-22)
set.seed(321)

# helper function to exclude non-unique links, see below
sl <- function(a, b) {
  v <- c(a, b)
  s <- sort(v)
  paste(s, collapse = "")
}

# create an undirected graph with random edges and weights
nodes <- letters[1:6]
edges0 <- data.frame(from = sample(nodes, 30, TRUE), to = sample(nodes, 30, TRUE)) %>%
  rowwise() %>%
  mutate(link = sl(from, to)) %>% # exclude non-unique links using function 'sl' (see above)
  filter(to != from) %>% # exclude links to self
  mutate(weight = runif(n = 1))

edges1 <- edges0
edges1$dupllink <- duplicated(edges1$link)

edges <- filter(edges1, !dupllink) %>%
  select(from, to, weight) %>%
  arrange(from, to, weight)

edges now looks like this:

> edges
# A tibble: 14 x 3
# Rowwise: 
   from  to    weight
   <chr> <chr>  <dbl>
 1 a     e     0.596 
 2 b     a     0.979 
 3 b     c     0.0852
 4 b     e     0.162 
 5 b     f     0.792 
 6 c     a     0.914 
 7 c     f     0.165 
 8 d     a     0.634 
 9 d     b     0.187 
10 d     c     0.957 
11 d     f     0.997 
12 e     c     0.599 
13 e     f     0.311 
14 f     a     0.531

Now, let's proceed to create a beautiful graph:

# gr is an undirected graph
gr <- igraph::graph_from_data_frame(edges, directed = FALSE, vertices = nodes)

# for plotting use tidygraph
gr.tidy <- tidygraph::as_tbl_graph(gr)

# create a ggraph object
p.gr <- ggraph(gr.tidy, layout = "graphopt") +
  # geom_edge_link(aes(width = weight), alpha = .2, lineend = "round") + 
  geom_edge_link(aes(width = weight, alpha = weight), lineend = "round", color = "red") + 
  scale_edge_width(range = c(0.1, 12)) +
  scale_edge_alpha(range = c(.05, .4)) +
  geom_node_point(size = 5) + 
  geom_node_text(aes(label = name), repel = T, size = 10, color = "darkgreen") +
  theme_graph()

# print on screen
print(p.gr)

The result (p.gr) looks like this:

Rplot

The final step(s) are where I am unsure:

# Prepare calculation of effective distance
amx <- igraph::as_adjacency_matrix(gr, attr = "weight", sparse = T) %>% as.matrix
# Calculate transition probability matrix - this is where I am lost, because a direction is introduced:
p <- amx/rowSums(amx)
# Calculate effective distances
ed <- NetOrigin::eff_dist(p)

ed looks like this:

> ed
         a        b        c        d        e        f
a 0.000000 2.317359 2.385861 2.752019 2.812498 2.928399
b 1.812437 0.000000 4.198298 3.467534 3.607921 2.024182
c 2.090858 4.408217 0.000000 2.044702 2.513411 3.800440
d 2.476735 3.697172 2.064421 0.000000 4.577832 2.023613
e 2.028844 3.329189 2.024760 4.069462 0.000000 2.680275
f 2.660908 2.261614 3.827953 2.031407 3.196439 0.000000

My question revolves around the creation of p: up to this point the matrices have been 'symmetrical' (i.e. the graph has been 'undirected') but now e.g. 'a to b' is not equivalent to 'b to a'. E.g. if you look at the first column (heading 'a') the ranking of the distances (b < e < c) do not correspond to image p.gr or table edges (where b < c < e).

Is this a feature of the function or am I interpreting the output in wrong way?

Thank you for any hints.

jmanitz commented 4 years ago

@jmanitz https://github.com/jmanitz/NetOrigin

jmanitz commented 4 years ago

Dear Johannes,

Thanks a lot for your comment and careful considerations. You are correct, the effective distance is not a metric in the mathematical sense as it does not fullfill symmetry, other conditions are fullfilled:

[x] non-negative,
[x] identity of indiscernibles
[ ] symmetry
[x] triangle in equality

Some more theory on the derivation can be found here:

Manitz, J., Kneib, T., Schlather, M., Helbing, D. and Brockmann, D. (2014) Origin detection during food-borne disease outbreaks - a case study of the 2011 EHEC/HUS outbreak in Germany. PLoS Currents Outbreaks, 1. <DOI: 10.1371/currents.outbreaks.f3fdeb08c5b9de7c09ed9cbcef5f01f2> url: http://currents.plos.org/outbreaks/index.html%3Fp=36515.html
Manitz, J., Harbering, J., Schmidt, M., Kneib, T. and Schöbel, A., 2017. Source estimation for propagation processes on complex networks with an application to delays in public transportation systems. Royal Statistical Society. Journal. Series C: Applied Statistics, 66(3), pp.521-536. url: https://core.ac.uk/download/pdf/154416059.pdf

Does that clarify your question?

Best regards from Boston Juliane

joheli commented 4 years ago

Dear Juliane,

thank you so much for your comment and the references - I obviously have to delve deeper into the matter.

Could you maybe comment on the following to further send me on the right track?

If effective distances are non-symmetric by nature, does that mean that I can only estimate them on a directed graph?
Could you also give an example (or use the one above) - how do I estimate effective distances with NetOrigin::eff_dist given a directed, weighted graph?

Also, how do I interpret the output, i.e. in a matrix of effective distances generated by NetOrigin::eff_dist

	A	B
A	X₁	X₃
B	X₂	X₄

Is X₂ B to A or A to B?

Kind regards,

Johannes

jmanitz commented 4 years ago

Hi Johannes,

no worries at all, I am happy to help!

The effective distance works for directed and undirected graphs. The implementation is able to accommodate that too. Here a little toy example:

# adjacency matrix with random weights
net <- matrix(nrow=6, ncol=6, data=runif(36))
isSymmetric(net)

# transition probabilties
p <- net/rowSums(net)
image(p)

# effective distance
eff <- eff_dist(p)
image(eff)

joheli / yea13

How to calculate the effective distance between nodes in a graph? #3