igraph / rigraph

igraph R package
https://r.igraph.org
543 stars 201 forks source link

Identical graphs not considered identical depending on argument `vertices` in graph_from_dataframe #349

Open mkoohafkan opened 4 years ago

mkoohafkan commented 4 years ago

Graphs with the same vertices and edgelist are not considered identical when the vertex order of the argument vertices does not match the order that they occur in the edgelist:

# sample data
set.seed(50)
library(igraph)

edges = unique(cbind(
  sample(100:200, 10),
  sample(100:200, 10)
))

vertices = unique(as.vector(edges))

# identical
g1 = graph_from_data_frame(edges, TRUE)
g2 = graph_from_data_frame(edges, TRUE, vertices)
identical_graphs(g1, g2)
## TRUE

# not identical
g3 = graph_from_data_frame(edges, TRUE, sort(vertices))
identical_graphs(g1, g3)
## FALSE

This also occurs if the vertices are named:

char.edges = cbind(
  paste0("OID", edges[,1]),
  paste0("OID", edges[,2])
)
char.vertices = paste0("OID", vertices)

cg1 = graph_from_data_frame(char.edges, TRUE)
cg2 = graph_from_data_frame(char.edges, TRUE, char.vertices)
identical_graphs(cg1, cg2)
## TRUE
cg3 = graph_from_data_frame(char.edges, TRUE, sort(char.vertices))
identical_graphs(cg1, cg3)
## FALSE
mbojan commented 4 years ago

I also found some problems as a by-product of #223 .

jmarshallnz commented 3 years ago

Just a note that this also happens when adding or subtracting edges via an adjacency matrix or via the built in add/delete edge functions:

g <- make_ring(10)
g_int = g + edge(7,9) + edge(8,10) - edge("7|8") - edge("9|10")

A = as_adj(g, sparse=FALSE, names=FALSE)
A[7,8] = A[8,7] = A[9,10] = A[10,9] = 0
A[7,9] = A[9,7] = A[8,10] = A[10,8] = 1
g_adj = graph_from_adjacency_matrix(A, mode="undirected")
identical_graphs(g_int, g_adj)

This returns false, even though they're exactly the same. By the looks the reason is that identical_graphs drops down to this code:

https://github.com/igraph/rigraph/blob/13e3853449090209305d6108b51e6933cd4d36f3/tools/stimulus/rinterface_extra.c#L256

which by the looks is assuming vertex and edge order are the same. To be honest it looks like the equivalent function within igraph itself:

https://github.com/igraph/igraph/blob/cf684b5ce44ac41732a7f61884b390e24172047a/src/graph/type_indexededgelist.c#L1826

may do the same thing, though it's less clear to me whether igraph itself maintains order when adding or removing vertices or edges.

vtraag commented 3 years ago

The function presumably checks the internal integer representation of the graph. Two graphs are considered identical if and only if the integer representations match exactly. Two graphs can also be the same, but have a different integer representation, in which case they are isomorphic, but not identical.

However, when vertices are named, it might make sense to account for the names. Hence, two named graphs would be considered identical if the named representation of the graph is identical, not the integer representation. At the moment this is not yet possible. @szhorvat, in the igraph core you worked on an is_same_graph function. Might it make sense to also have a similar function to test whether a particular permutation (i.e. bijection) constitutes an isomorphism? The identical names in two graphs are ordered differently, but their permutation would be straightforward to identify.

jmarshallnz commented 3 years ago

Yes - in my example above the vertices are in the same order in both graphs, but the edge list is not in the same order, so they're not identical in representation, but are isomorphic.

> V(g_int)
+ 10/10 vertices, from c793b85:
 [1]  1  2  3  4  5  6  7  8  9 10
> V(g_adj)
+ 10/10 vertices, from 9466d09:
 [1]  1  2  3  4  5  6  7  8  9 10
> E(g_int)
+ 10/10 edges from c793b85:
 [1] 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 8-- 9 1--10 7-- 9 8--10
> E(g_adj)
+ 10/10 edges from 9466d09:
 [1] 1-- 2 1--10 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 7-- 9 8-- 9 8--10

I wonder if all that's really needed is a bit of documentation added to identical_graphs() that makes it clear it's testing that the internal data representation is identical, rather than the structure of the graph, with a suggestion that isomorphic() explicitly tests structure instead?

ntamas commented 3 years ago

Makes sense to me. @vtraag ?

vtraag commented 3 years ago

Yes, the documentation should be clarified regardless.

There are several distinct notions of "identical" or "same" graph. The identical_graphs is very strict, in the sense that the edge ordering also needs to be identical. The igraph_is_same_graph that we recently implemented in the C core abstracts away from that and simply checks whether the edge set is identical (as labelled graphs). Since the nodes in R can have names, it might make sense to have a similar is_same_graph that does take into account this labeling (but which might correspond to a different integer ordering). Finally, there is the isormorphic function which checks whether two graphs are the same up to a permutation of the node labels. This is of course much more computationally intensive.

My suggestion was to create something similar to igraph_is_same_graph in R, but which takes into account the labels of the nodes. This requires a C core function like

igraph_is_isomorphic(igraph_t* g1, igraph_t* g2, igraph_vector_t* mapping, igraph_bool_t* res)

where mapping would be a permutation of nodes from g1 to g2, which in this case would be provided by the R interface.

On the R side, this could be implemented like

identical_graphs(g1, g2, ignore_edge_ordering=c(FALSE, TRUE), ignore_vertex_names=c(FALSE, TRUE))

allowing a user to check whether two graphs are identical in various ways.

szhorvat commented 3 years ago

My 2 cents: In mathematics, there are two useful notions that I've seen:

  1. Two graphs are the same is their vertex sets and edge sets are the same. Note sets, so ordering does not matter.
  2. Two graph are isomorphic if the vertices of one can be re-labelled so as to make their edge sets be the same.

Sometimes we say that they are identical as labelled graphs (point 1) or as unlabelled ones (point 2, i.e. isomorphism).

Requiring ordered edge lists to be the same is not particularly useful in math, but may occasionally be useful when programming ... maybe ... does anyone actually have a use case for this?

vtraag commented 3 years ago

Requiring ordered edge lists to be the same is not particularly useful in math, but may occasionally be useful when programming.

Typically this is not useful indeed. However, when you are passing around edge weights for example, it might be useful to know that the two graphs are really exactly identical, including all edge and vertex orderings.

szhorvat commented 3 years ago

If we go down that route, we need to consider all the possible "sameness" concepts brought about by having both vertex names and vertex indices. There will be many possibilities.

E.g., there may be two graphs where the edge lists, when written in terms of vertex names, look identical. However, the vertex lists are ordered differently. If I understand correctly what you meant by the ignore_edge_ordering and ignore_vertex_names arguments, this case wouldn't be covered by them. Yet it is relevant for transferring edge weights ...

(To be clear, I do not propose actually supporting all the cases. I think we shouldn't. But it's good to think them through, and think about which are/aren't useful.)

vtraag commented 3 years ago

E.g., there may be two graphs where the edge lists, when written in terms of vertex names, look identical. However, the vertex lists are ordered differently. If I understand correctly what you meant by the ignore_edge_ordering and ignore_vertex_names arguments, this case wouldn't be covered by them. Yet it is relevant for transferring edge weights ...

Using vertex names essentially comes down to ignoring vertex ordering in my proposal. If two graphs would have identical edge lists when written in terms of vertex names, they should be considered identical even if ignore_edge_ordering=FALSE.

ntamas commented 1 year ago

Requiring ordered edge lists to be the same is not particularly useful in math, but may occasionally be useful when programming ... maybe ... does anyone actually have a use case for this?

Many unit tests rely on storing an expected representation of the graph, and the test itself checks whether the actual representation obtained from some igraph function is identical to the expected representation, also w.r.t. vertex and edge ordering.

szhorvat commented 1 year ago

In my experience, a more common requirement in tests is to ignore the edge ordering when comparing graphs. But of course, both variants have their uses.

However, isn't checking that the edge lists are identical quite trivial, all(as_edgelist(g1, names=F) == as_edgelist(g2, names=F))? This can be a helper function for testing if necessary.

ntamas commented 1 year ago

It doesn't compare the internal indexes (which are represented on the R side, contrary to other higher-level interfaces like the Python interface, which only holds a reference to an igraph_t that lives entirely on the C side), or the attributes. identical_graphs compares almost all components of the R representation. This is really intended for unit tests and nothing else.

maelle commented 7 months ago

The documentation part of this issue (clarifying what identical_graphs() does in its manual page) seems tackled.

What about the idea of is_same_graph @ntamas @szhorvat @vtraag? is this still wanted? is it work to do here or in the C core?