joheli / yea13

Applies algorithm by Ypma et al. (2013) to detect clusters in surveillance data
GNU Affero General Public License v3.0
2 stars 2 forks source link

How to calculate the effective distance between nodes in a graph? #3

Open joheli opened 4 years ago

joheli commented 4 years ago

Folks I would like to understand how to correctly apply the function eff_dist of package NetOrigin and there is one step I am not sure of (see below the paragraph in bold).

First, let's create a few nodes with random edges and weights:

# set seed using R version 4.0.2 (2020-06-22)
set.seed(321)

# helper function to exclude non-unique links, see below
sl <- function(a, b) {
  v <- c(a, b)
  s <- sort(v)
  paste(s, collapse = "")
}

# create an undirected graph with random edges and weights
nodes <- letters[1:6]
edges0 <- data.frame(from = sample(nodes, 30, TRUE), to = sample(nodes, 30, TRUE)) %>%
  rowwise() %>%
  mutate(link = sl(from, to)) %>% # exclude non-unique links using function 'sl' (see above)
  filter(to != from) %>% # exclude links to self
  mutate(weight = runif(n = 1))

edges1 <- edges0
edges1$dupllink <- duplicated(edges1$link)

edges <- filter(edges1, !dupllink) %>%
  select(from, to, weight) %>%
  arrange(from, to, weight)

edges now looks like this:

> edges
# A tibble: 14 x 3
# Rowwise: 
   from  to    weight
   <chr> <chr>  <dbl>
 1 a     e     0.596 
 2 b     a     0.979 
 3 b     c     0.0852
 4 b     e     0.162 
 5 b     f     0.792 
 6 c     a     0.914 
 7 c     f     0.165 
 8 d     a     0.634 
 9 d     b     0.187 
10 d     c     0.957 
11 d     f     0.997 
12 e     c     0.599 
13 e     f     0.311 
14 f     a     0.531 

Now, let's proceed to create a beautiful graph:

# gr is an undirected graph
gr <- igraph::graph_from_data_frame(edges, directed = FALSE, vertices = nodes)

# for plotting use tidygraph
gr.tidy <- tidygraph::as_tbl_graph(gr)

# create a ggraph object
p.gr <- ggraph(gr.tidy, layout = "graphopt") +
  # geom_edge_link(aes(width = weight), alpha = .2, lineend = "round") + 
  geom_edge_link(aes(width = weight, alpha = weight), lineend = "round", color = "red") + 
  scale_edge_width(range = c(0.1, 12)) +
  scale_edge_alpha(range = c(.05, .4)) +
  geom_node_point(size = 5) + 
  geom_node_text(aes(label = name), repel = T, size = 10, color = "darkgreen") +
  theme_graph()

# print on screen
print(p.gr)

The result (p.gr) looks like this:

Rplot

The final step(s) are where I am unsure:

# Prepare calculation of effective distance
amx <- igraph::as_adjacency_matrix(gr, attr = "weight", sparse = T) %>% as.matrix
# Calculate transition probability matrix - this is where I am lost, because a direction is introduced:
p <- amx/rowSums(amx)
# Calculate effective distances
ed <- NetOrigin::eff_dist(p)

ed looks like this:

> ed
         a        b        c        d        e        f
a 0.000000 2.317359 2.385861 2.752019 2.812498 2.928399
b 1.812437 0.000000 4.198298 3.467534 3.607921 2.024182
c 2.090858 4.408217 0.000000 2.044702 2.513411 3.800440
d 2.476735 3.697172 2.064421 0.000000 4.577832 2.023613
e 2.028844 3.329189 2.024760 4.069462 0.000000 2.680275
f 2.660908 2.261614 3.827953 2.031407 3.196439 0.000000

My question revolves around the creation of p: up to this point the matrices have been 'symmetrical' (i.e. the graph has been 'undirected') but now e.g. 'a to b' is not equivalent to 'b to a'. E.g. if you look at the first column (heading 'a') the ranking of the distances (b < e < c) do not correspond to image p.gr or table edges (where b < c < e).

Is this a feature of the function or am I interpreting the output in wrong way?

Thank you for any hints.

jmanitz commented 4 years ago

@jmanitz https://github.com/jmanitz/NetOrigin

jmanitz commented 4 years ago

Dear Johannes,

Thanks a lot for your comment and careful considerations. You are correct, the effective distance is not a metric in the mathematical sense as it does not fullfill symmetry, other conditions are fullfilled:

Some more theory on the derivation can be found here:

Does that clarify your question?

Best regards from Boston Juliane

joheli commented 4 years ago

Dear Juliane,

thank you so much for your comment and the references - I obviously have to delve deeper into the matter.

Could you maybe comment on the following to further send me on the right track?

Also, how do I interpret the output, i.e. in a matrix of effective distances generated by NetOrigin::eff_dist

A B
A X1 X3
B X2 X4

Is X2 B to A or A to B?

Kind regards,

Johannes

jmanitz commented 4 years ago

Hi Johannes,

no worries at all, I am happy to help!

The effective distance works for directed and undirected graphs. The implementation is able to accommodate that too. Here a little toy example:

# adjacency matrix with random weights
net <- matrix(nrow=6, ncol=6, data=runif(36))
isSymmetric(net)

# transition probabilties
p <- net/rowSums(net)
image(p)

# effective distance
eff <- eff_dist(p)
image(eff)

image