Write function(s) to read g6, s6, and d6 files directly

mbojan commented 3 years ago

Should be a simple application of readLines().

schochastics commented 2 years ago

What do you have in mind here? I could think of the following:

read_graph6 <- function(file, type = "adjacency"){
  type <- match.arg(type,c("adjacency","edgelist","igraph","network"))
  txt <- readLines(file)
  switch(type,
    "adjacency" = adjacency_from_graph6(txt),
    "edgelist"  = edgelist_from_text(txt),
    "igraph"    = igraph_from_graph6(txt),
    "network"   = network_from_graph6(txt)
  )
}

read_digraph6 <- function(file, type = "adjacency"){
  type <- match.arg(type,c("adjacency","edgelist","igraph","network"))
  txt <- readLines(file)
  switch(type,
         "adjacency" = adjacency_from_digraph6(txt),
         "edgelist"  = edgelist_from_text(txt),
         "igraph"    = igraph_from_digraph6(txt),
         "network"   = network_from_digraph6(txt)
  )
}

read_sparse6 <- function(file, type = "edgelist"){
  type <- match.arg(type,c("adjacency","edgelist","igraph","network"))
  txt <- readLines(file)
  switch(type,
         "adjacency" = adjacency_from_text(txt),
         "edgelist"  = edgelist_from_sparse6(txt),
         "igraph"    = igraph_from_sparse6(txt),
         "network"   = network_from_sparse6(txt)
  )
}

but there should probably also be a general function that reads mixed files (Had troubles coming up with a good name)

read_file6 <- function(file, type="adjacency"){
  type <- match.arg(type,c("adjacency","edgelist","igraph","network"))
  txt <- readLines(file)
  switch(type,
         "adjacency" = adjacency_from_text(txt),
         "edgelist" = edgelist_from_text(txt),
         "igraph"   = igraph_from_text(txt),
         "network"  = network_from_text(txt)
  )
}

what do you think?

mbojan commented 2 years ago

Thanks! Looks good. Some comments:

We need to pay attention to the optional headers such as >>graph6<< etc. See http://users.cecs.anu.edu.au/~bdm/data/formats.txt I've met users that use our package to read files generated with some other tools (such as Nauty) that include those headers. Reading mixed files will be a cherry on top.
Am I right that the functions only differ in terms of the default for the type argument? Perhaps there should be only one function with format argument? I'll think about it.

schochastics commented 2 years ago

~~I think nauty doesnt produce headers (at least not the geng routine) but yeah,~~(found the parameter for that...) headers should be automatically skipped. Though probably it should only be the standard headers (>>graph6<<,>>sparse6<<, and >>digraph6<<)? otherwise it might get too complicated. So some suggestions:
- just parse out standard headers
- add a skip argument to let users handle any more complicated header
- use scan() with comment.char = ">". This way we would enforce a standard comment character for such files.
There was a c&p error in read_sparse6() which is now corrected (had from_graph6 in switch rather than from_sparse6). So the functions do differ but for some options I used the generic *_from_text because the specialized version doesnt exist (yet; e.g. there is no edgelist_from_graph6() or edgelist_from_digraph6())

I did three functions just for compatibility with the rest of the package but technically only the read_file6 is needed. And thanks to the _from_text functions, it reads mixed files without any problem.

library(rgraph6)
library(igraph)

read_file6 <- function(file, type="adjacency"){
  type <- match.arg(type,c("adjacency","edgelist","igraph","network"))
  txt <- readLines(file)
  switch(type,
         "adjacency" = adjacency_from_text(txt),
         "edgelist" = edgelist_from_text(txt),
         "igraph"   = igraph_from_text(txt),
         "network"  = network_from_text(txt)
  )
}

write(
c(as_graph6(graph.star(5,"undirected")),
  as_sparse6(graph.star(5,"undirected")),
  as_digraph6(graph.star(5,"in"))),"test6.txt"
)

read_file6("test6.txt")
#> [[1]]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    1    1    1
#> [2,]    1    0    0    0    0
#> [3,]    1    0    0    0    0
#> [4,]    1    0    0    0    0
#> [5,]    1    0    0    0    0
#> 
#> [[2]]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    1    1    1
#> [2,]    1    0    0    0    0
#> [3,]    1    0    0    0    0
#> [4,]    1    0    0    0    0
#> [5,]    1    0    0    0    0
#> 
#> [[3]]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    1    0    0    0    0
#> [3,]    1    0    0    0    0
#> [4,]    1    0    0    0    0
#> [5,]    1    0    0    0    0

Your call, but probably one generic function is enough.

mbojan commented 2 years ago

Yeah, I guess one function is enough. I'll use yours above as a starting point and add handling of the optional headers.

mbojan commented 2 years ago

Added.

I'm wondering whether we should

just drop potential header and let *_from_text() handle the type of the symbol, or
implement some of logic that reads the header and errs if any of the symbols does not look like the format implied by the header?

Opinions @schochastics ?

mbojan commented 2 years ago

~I think nauty doesnt produce headers (at least not the geng routine) but yeah,~(found the parameter for that...) headers should be automatically skipped. Though probably it should only be the standard headers (>>graph6<<,>>sparse6<<, and >>digraph6<<)? otherwise it might get too complicated. So some suggestions:

just parse out standard headers

add a skip argument to let users handle any more complicated header

use scan() with comment.char = ">". This way we would enforce a standard comment character for such files.

Can there be more info in that header? Is that documented somewhere?

If the header has the form of >>graph6 blahblah<< we could e.g.

txt <- readLines("file")
txt[1] <- gsub("^>>[^<]+<<", "", txt[1])

and proceed as in your prototype above

schochastics commented 2 years ago

Just looked at nauty documentation again and there should actually not be a newline after the "header" (see line three)

./geng 5 -c -l -h
>A ./geng -cld1D4 n=5 e=4-10
>>graph6<<D?{
D@s
D@{
...

Otherwise I cant find any other header indicators in all the documentation. So I think the easiest is to implement it like you said

txt <- readLines("file")
txt[1] <- gsub("^>>[^<]+<<", "", txt[1])

and explain in the documentation the way headers are expected to be. Just to be safe, we could add

txt <- txt[txt != ""]

for cases where there is an accidental new line after the header.

mbojan / rgraph6

Write function(s) to read g6, s6, and d6 files directly #24