mbojan / rgraph6

Encoding graphs in graph6, sparse6, and digraph6 formats
https://mbojan.github.io/rgraph6/
GNU General Public License v3.0
12 stars 3 forks source link

added sparse6 format (closes #10) #18

Closed schochastics closed 4 years ago

schochastics commented 4 years ago

sparse6 format added
works slightly different, since we are assuming sparse networks. I decided to use edgelists instead of the adjacency matrix. Naming scheme is the same though. so as_adjacency() returns an edgelist. One could opt for sparse matrices but that would require a dependency on Matrix.

There is a bit of a runtime bottleneck in as_sparse6.R lines35+36 where the edgelist is sorted. This makes as_sparse6() slightly slower than as_graph6() for large graphs.

Edit: This can be significantly improved when using sparse matrices (did some benchmarks). So if you are not happy with the edgelists and fine with a Matrix dependency, I could rewrite that bit again to make it faster.

misc

TODO
as_igraph() and as_network() dont work with sparse6 and dgraph6 yet.

mbojan commented 4 years ago

sparse6 format added

Great!

works slightly different, since we are assuming sparse networks. I decided to use edgelists instead of the adjacency matrix. Naming scheme is the same though. so as_adjacency() returns an edgelist.

  1. Since edgelist is more natural for sparse graphs, let's create as_edgelist() and not use as_adjecency() in this context. It will be confusing.

One could opt for sparse matrices but that would require a dependency on Matrix.

  1. If it makes sense, let's add Matrix as a dependency. Perhaps as Suggests not Imports.

There is a bit of a runtime bottleneck in as_sparse6.R lines35+36 where the edgelist is sorted. This makes as_sparse6() slightly slower than as_graph6() for large graphs.

If sorting is necessary, then oh well....

Edit: This can be significantly improved when using sparse matrices (did some benchmarks). So if you are not happy with the edgelists and fine with a Matrix dependency, I could rewrite that bit again to make it faster.

  1. I think we could have both of these, especially since you already tried that. What do you think? We could then pick the more efficient for as_igraph and as_network. In the network package Carter even implemented some heuristics when to pick edgelists over the adjacency matrices which could be an inspiration.
  • Added tests (note that the tests run ~7s which is too long for CRAN; should run in 5s)

I think I will also trim-down some other tests.

TODO as_igraph() and as_network() dont work with sparse6 and dgraph6 yet.

Just add new issues please.

Let's address 1, 2, and 3 before I merge

schochastics commented 4 years ago
  1. All right I'll add as_edgelist(). should that be done for all three formats then? I'll have a look what Carter is doing in the network package.

  2. and 3. I'll add Matrix as suggest and choose the more efficient edgelist sorting if the package is present.

Not so familiar with pull requests: Can I just comit to my fork and it updates the request automatically or do I have to make a new request?

mbojan commented 4 years ago
  1. All right I'll add as_edgelist(). should that be done for all three formats then? I'll have a look what Carter is doing in the network package.

I think we can add it just for the sparse6 for now.

  1. and 3. I'll add Matrix as suggest and choose the more efficient edgelist sorting if the package is present.

:+1:

Not so familiar with pull requests: Can I just comit to my fork and it updates the request automatically or do I have to make a new request?

Yes. Although ideally you should have it on a branch separate from master so things will not get mixed up if some unrelated commit will have to be made. I think it will do for now. Just keep adding commits and they should appear here.

schochastics commented 4 years ago

All right I'll be careful with the commits since my branching skills are non existent ;-)

schochastics commented 4 years ago

added as_elist

small benchmark with a gnp(n=2000,p=0.015)

# A tibble: 2 x 13
  expression     min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time
  <bch:expr>   <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm>
1 sparseMatrix 1.29s  1.36s     0.704     282MB     5.35     5    38      7.11s
2 edgelist     2.22s  2.35s     0.419     280MB     5.45     5    65     11.93s
# … with 4 more variables: result <list>, memory <list>, time <list>, gc <list>
mbojan commented 4 years ago

Thanks @schochastics !