Open mkoohafkan opened 4 years ago
I also found some problems as a by-product of #223 .
Just a note that this also happens when adding or subtracting edges via an adjacency matrix or via the built in add/delete edge functions:
g <- make_ring(10)
g_int = g + edge(7,9) + edge(8,10) - edge("7|8") - edge("9|10")
A = as_adj(g, sparse=FALSE, names=FALSE)
A[7,8] = A[8,7] = A[9,10] = A[10,9] = 0
A[7,9] = A[9,7] = A[8,10] = A[10,8] = 1
g_adj = graph_from_adjacency_matrix(A, mode="undirected")
identical_graphs(g_int, g_adj)
This returns false, even though they're exactly the same. By the looks the reason is that identical_graphs drops down to this code:
which by the looks is assuming vertex and edge order are the same. To be honest it looks like the equivalent function within igraph itself:
may do the same thing, though it's less clear to me whether igraph itself maintains order when adding or removing vertices or edges.
The function presumably checks the internal integer representation of the graph. Two graphs are considered identical if and only if the integer representations match exactly. Two graphs can also be the same, but have a different integer representation, in which case they are isomorphic, but not identical.
However, when vertices are named, it might make sense to account for the names. Hence, two named graphs would be considered identical if the named representation of the graph is identical, not the integer representation. At the moment this is not yet possible. @szhorvat, in the igraph
core you worked on an is_same_graph
function. Might it make sense to also have a similar function to test whether a particular permutation (i.e. bijection) constitutes an isomorphism? The identical names in two graphs are ordered differently, but their permutation would be straightforward to identify.
Yes - in my example above the vertices are in the same order in both graphs, but the edge list is not in the same order, so they're not identical in representation, but are isomorphic.
> V(g_int)
+ 10/10 vertices, from c793b85:
[1] 1 2 3 4 5 6 7 8 9 10
> V(g_adj)
+ 10/10 vertices, from 9466d09:
[1] 1 2 3 4 5 6 7 8 9 10
> E(g_int)
+ 10/10 edges from c793b85:
[1] 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 8-- 9 1--10 7-- 9 8--10
> E(g_adj)
+ 10/10 edges from 9466d09:
[1] 1-- 2 1--10 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 7-- 9 8-- 9 8--10
I wonder if all that's really needed is a bit of documentation added to identical_graphs() that makes it clear it's testing that the internal data representation is identical, rather than the structure of the graph, with a suggestion that isomorphic() explicitly tests structure instead?
Makes sense to me. @vtraag ?
Yes, the documentation should be clarified regardless.
There are several distinct notions of "identical" or "same" graph. The identical_graphs
is very strict, in the sense that the edge ordering also needs to be identical. The igraph_is_same_graph
that we recently implemented in the C core abstracts away from that and simply checks whether the edge set is identical (as labelled graphs). Since the nodes in R can have names, it might make sense to have a similar is_same_graph
that does take into account this labeling (but which might correspond to a different integer ordering). Finally, there is the isormorphic
function which checks whether two graphs are the same up to a permutation of the node labels. This is of course much more computationally intensive.
My suggestion was to create something similar to igraph_is_same_graph
in R, but which takes into account the labels of the nodes. This requires a C core function like
igraph_is_isomorphic(igraph_t* g1, igraph_t* g2, igraph_vector_t* mapping, igraph_bool_t* res)
where mapping
would be a permutation of nodes from g1
to g2
, which in this case would be provided by the R interface.
On the R side, this could be implemented like
identical_graphs(g1, g2, ignore_edge_ordering=c(FALSE, TRUE), ignore_vertex_names=c(FALSE, TRUE))
allowing a user to check whether two graphs are identical in various ways.
My 2 cents: In mathematics, there are two useful notions that I've seen:
Sometimes we say that they are identical as labelled graphs (point 1) or as unlabelled ones (point 2, i.e. isomorphism).
Requiring ordered edge lists to be the same is not particularly useful in math, but may occasionally be useful when programming ... maybe ... does anyone actually have a use case for this?
Requiring ordered edge lists to be the same is not particularly useful in math, but may occasionally be useful when programming.
Typically this is not useful indeed. However, when you are passing around edge weights for example, it might be useful to know that the two graphs are really exactly identical, including all edge and vertex orderings.
If we go down that route, we need to consider all the possible "sameness" concepts brought about by having both vertex names and vertex indices. There will be many possibilities.
E.g., there may be two graphs where the edge lists, when written in terms of vertex names, look identical. However, the vertex lists are ordered differently. If I understand correctly what you meant by the ignore_edge_ordering
and ignore_vertex_names
arguments, this case wouldn't be covered by them. Yet it is relevant for transferring edge weights ...
(To be clear, I do not propose actually supporting all the cases. I think we shouldn't. But it's good to think them through, and think about which are/aren't useful.)
E.g., there may be two graphs where the edge lists, when written in terms of vertex names, look identical. However, the vertex lists are ordered differently. If I understand correctly what you meant by the
ignore_edge_ordering
andignore_vertex_names
arguments, this case wouldn't be covered by them. Yet it is relevant for transferring edge weights ...
Using vertex names essentially comes down to ignoring vertex ordering in my proposal. If two graphs would have identical edge lists when written in terms of vertex names, they should be considered identical even if ignore_edge_ordering=FALSE
.
Requiring ordered edge lists to be the same is not particularly useful in math, but may occasionally be useful when programming ... maybe ... does anyone actually have a use case for this?
Many unit tests rely on storing an expected representation of the graph, and the test itself checks whether the actual representation obtained from some igraph function is identical to the expected representation, also w.r.t. vertex and edge ordering.
In my experience, a more common requirement in tests is to ignore the edge ordering when comparing graphs. But of course, both variants have their uses.
However, isn't checking that the edge lists are identical quite trivial, all(as_edgelist(g1, names=F) == as_edgelist(g2, names=F))
? This can be a helper function for testing if necessary.
It doesn't compare the internal indexes (which are represented on the R side, contrary to other higher-level interfaces like the Python interface, which only holds a reference to an igraph_t
that lives entirely on the C side), or the attributes. identical_graphs
compares almost all components of the R representation. This is really intended for unit tests and nothing else.
The documentation part of this issue (clarifying what identical_graphs()
does in its manual page) seems tackled.
What about the idea of is_same_graph
@ntamas @szhorvat @vtraag? is this still wanted? is it work to do here or in the C core?
Graphs with the same vertices and edgelist are not considered identical when the vertex order of the argument
vertices
does not match the order that they occur in the edgelist:This also occurs if the vertices are named: