Generating a incidence matrix from dataframe

Jigyasa3 commented 1 year ago

Dear @mschubert , @saezrodriguez , @deeenes , @luzgaral

Thank you for a great package! How do I convert a data frame to the incidence matrix required as input for the package? I tried multiple versions of generating a bipartite graph from the data frame (using igraph) but none works for a bipartite matrix.

Could you please share a version of how to generate an incidence matrix from a data frame? My data frame looks like this-

v1<-c( 1 , 1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  3,  4, 5,  6,  7,  8,  9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 12, 12, 12, 13, 14, 15, 16, 17, 17, 17, 17, 17, 18, 19, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 33, 34, 35, 36, 37, 37, 37, 37, 37, 37, 37, 37, 38, 39, 40, 40, 41, 41, 42, 43, 44, 45, 45, 46, 46, 47, 48, 48, 49, 50, 51, 51, 51, 51, 51, 51, 52, 53, 54, 54, 54, 54, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 57, 57)
v2<-c( 1 ,  2,   3,  4,   5,   6,   7,   8,   9,  10,  11,  12,  13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,  24,  19,  28,  29,  30,  31, 32,  33,  34,  35,  36,  37,  38,  39,  31,  40,  41,  42,  42,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,  51,  52,  53,  54,  55,  56,  56,  57,  58,  59, 60,  61,  62, 63,   8,  10,  64,  63,  65,  66,  67,  65,  68,  69,  70,  71,  72,  73,  22,  74,  75,  76,  75,  30,  77,  78,  79,  80,  81,  82,  83,  84,  24, 85,  86,  87,  88,  89,  90,  91,  92,  93,  94,  95,  96,  97,  94,  79,  98,  99, 100, 101, 102, 103,  34, 104, 105, 106, 107, 108, 109, 110,  50, 104)

df<-data.frame(v1,v2)

g <- make_bipartite_graph(df$v1,df$v2)
as_incidence_matrix(g)

deeenes commented 1 year ago

Hi,

I assume the two vectors represent source and target nodes of directed edges. If you have +/- signs, you can provide them as a third vector (or column), without that, we consider all edges positive, and leave the negative incidence matrix empty. Unfortunately, BiRewire has a little bug when handling this empty slot, that's why we replace it with NULL below:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("BiRewire")

library(BiRewire)

v1 <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 10, 10, 10, 10, 11, 12, 12, 12, 13,
14, 15, 16, 17, 17, 17, 17, 17, 18, 19, 20, 21, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 33, 34, 35, 36, 37, 37, 37, 37, 37, 37, 37, 37, 38, 39, 40,
40, 41, 41, 42, 43, 44, 45, 45, 46, 46, 47, 48, 48, 49, 50, 51, 51, 51, 51, 51,
51, 52, 53, 54, 54, 54, 54, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56,
56, 56, 57, 57)

v2 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 24, 19, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 31, 40, 41, 42, 42, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 51, 52, 53, 54,
55, 56, 56, 57, 58, 59, 60, 61, 62, 63,  8, 10, 64, 63, 65, 66, 67, 65, 68, 69,
70, 71, 72, 73, 22, 74, 75, 76, 75, 30, 77, 78, 79, 80, 81, 82, 83, 84, 24, 85,
86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 94, 79, 98, 99, 100, 101, 102,
103, 34, 104, 105, 106, 107, 108, 109, 110, 50, 104)

edges <- data.frame(
    source = as.character(v1),
    sign = rep('+', length.out = length(v1)),
    target = as.character(v2)
)
dsg <- birewire.induced.bipartite(edges)
dsg$negative <- NULL
random_dsg <- birewire.rewire.dsg(dsg)
random_g <- graph_from_incidence_matrix(
    random_dsg$positive,
    directed = TRUE,
    mode = 'OUT'
)

Above we finish by converting the randomized adjacency matrix to igraph object. Let's also convert the original graph to igraph and visualize them:

g <- graph(c(t(matrix(c(v1, v2), ncol = 2L))))
a <- as_adjacency_matrix(g)

plot(g, vertex.size = 5L)
plot(random_g, vertex.size = 5L)

Original: issue1-original

Randomized: issue1-random

Jigyasa3 commented 1 year ago

Hi @deeenes,

Thank you for replying! I wanted to also ask if there is a way to extract the interactions that are significantly different from random ones?

deeenes commented 1 year ago

Network topologies are different from random only in the context of some specific problem. For this reason, BiRewire doesn't provide statistical tests or frameworks to compare observed networks with null models. You should generate large number of randomized networks, not only one as in the example above, the set of these randomized networks approximate a null model of your specific problem. Then, you can calculate your metrics on your observed network and on the randomized networks, and perform an appropriate statistical test to address your hypothesis. More exact steps really depend on your actual problem.

saezlab / BiRewire

Generating a incidence matrix from dataframe #1