Handling edgelist matrices for 'sparse6'

mbojan / rgraph6

Encoding graphs in graph6, sparse6, and digraph6 formats

https://mbojan.github.io/rgraph6/

GNU General Public License v3.0

12 stars 3 forks source link

Handling edgelist matrices for 'sparse6' #27

Closed mbojan closed 2 years ago

mbojan commented 2 years ago

This fails because the matrix method additionally expects n - the network size

l <- list(
  matrix(c(1,2, 2,3, 3,4), ncol=2, byrow=TRUE),
  matrix(c(1,2, 2,3), ncol=2, byrow=TRUE)
)

as_sparse6(l)

Possible solutions:

Assume that max(object) is the graph order? That was initial implementation which enforces using a sequence of integers as node IDs.
Remap the entries in the matrix to a sequence of integers and as item above.
Refuse to handle lists of edgelist matrices.
Set default n=max(object).

mbojan commented 2 years ago

Hmm... It turns out that the current version which has n does not work as I expected:

elm1 <- matrix(c(1,2, 2,3, 3,4), ncol=2, byrow=TRUE)
elm2 <- elm1 * 20

s1 <- as_sparse6(elm1, n = 4)
s2 <- as_sparse6(elm2, n = 4)
## Error in as_sparse6.matrix(elm2, n = 4) : n >= max(object) is not TRUE

Should we then follow (1) which was the initial implementation? I'm somewhat concerned how to catch unintentional situations when the user tries to process an edgelist matrix of some externally-determined node IDs such as:

m <- structure(c(1001, 2002, 3003, 2002, 3003, 4004), .Dim = 3:2)
m
##     [,1] [,2]
## [1,] 1001 2002
## [2,] 2002 3003
## [3,] 3003 4004

On the other hand e.g. igraph is not concerned at all:

z <- igraph::graph_from_edgelist(m)
z
## IGRAPH a7bdb08 D--- 4004 3 -- 
## + edges from a7bdb08:
## [1] 1001->2002 2002->3003 3003->4004
igraph::vcount(z)
## [1] 4004

which again suggests (1).

CC @schochastics .

schochastics commented 2 years ago

hmm there is also a problem with character inputs

elm3 <- cbind(LETTERS[elm1[,1]],LETTERS[elm1[,2]])
s3 <- rgraph6::as_sparse6(elm3, n = 4)
#> Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)]: non-numeric argument to binary operator

(2) might be an option too but I am afraid there might be too many special cases to handle. In the original documentation, they also assume: "The vertices of the graph are 0..n-1.". So I'd suggest going back to (1), but also add an error handler for cases where is.numeric(object) is FALSE and say in the description something along the line: "The function expects nodes to be a sequence of integers. Otherwise, unwanted side-effects may occur".

I can implement these changes later if you agree.

schochastics commented 2 years ago

Playing around with this, one sideffect we will have with (1) is that graphs with n isolates will always be mapped to the empty graph.

sapply(0:10,function(i) rgraph6::as_sparse6(igraph::graph.empty(n=i,directed = FALSE)))
#>  [1] ":?" ":?" ":?" ":?" ":?" ":?" ":?" ":?" ":?" ":?" ":?"

I personally think that this is an acceptable behaviour though.

mbojan commented 2 years ago

Perhaps it still makes sense for the matrix method to have the argument n=max(object) (solution (4))? It is then possible to generate proper symbols with something like:

sapply(0:10, function(vc) {
   elm <- matrix(0, ncol=2, nrow=0)
   as_sparse6(elm, n = vc)
})
## [1] ":?" ":@" ":A" ":B" ":C" ":D" ":E" ":F" ":G" ":H" ":I"

Which is consistent with current:

sapply(0:10,function(i) rgraph6::as_sparse6(igraph::graph.empty(n=i,directed = FALSE)))
##  [1] ":?" ":@" ":A" ":B" ":C" ":D" ":E" ":F" ":G" ":H" ":I"

mbojan commented 2 years ago

Actually it seems even better to have as_sparse6.matrix(object, n = max(object, 0))...

mbojan commented 2 years ago

Tackling this on the branch mbojan/rgraph6@i27-as-sparse6-matrix.

mbojan commented 2 years ago

For one, the initial issue is gone

l <- list(
  matrix(c(1,2, 2,3, 3,4), ncol=2, byrow=TRUE),
  matrix(c(1,2, 2,3), ncol=2, byrow=TRUE)
)
as_sparse6(l)
## [1] ":Cdv" ":Bd"

mbojan commented 2 years ago

Seems to work and no other tests fail. Closing this one at this time.