oscoin / osrank-rs

A pre-alpha osrank implementation in Rust.
http://oscoin.io/
3 stars 3 forks source link

[Research] Investigate petgraph & alternatives #29

Open MeBrei opened 5 years ago

MeBrei commented 5 years ago

So far we've been using petgraph, but with changing requirements we should check if this is still the best choice.

Generally assess if there are alternatives for petgraph. In particular we want to determine if there are graph libraries that support multi-edges between nodes (as suggested in Graph API discussions) and/or if petgraph can or can be tweaked to do this. Also keep in mind the limitations of the libraries regarding scale and performance.

As an outcome, we want to have a clear picture of how we could implement multi-edges or have clearly defined limitations that we can bring to the Graph API meeting.

adinapoli-mndc commented 5 years ago

Before picking petgraph, I have spent a certain number of man hours researching the "best" graph library, and this one seemed to be the best available in terms of features and algorithm, despite it seems like it's not very actively maintained and some people have questioned some API decisions (see for example here). Having said that, the reason why I went with petgraph in the first place was:

  1. It's from a highly-regarded member of the Rust community;
  2. It has be biggest number of downloads;
  3. It has support for CSR representation.

If our requirement changes and we have to support multiple edges no matter what, obviously we can forget the CSR compact representation, so 3. might be relaxed. petgraph itself supports a number of concrete graph implementations. From the README:

petgraph is a graph data structure library.

Graph which is an adjacency list graph with arbitrary associated data.

StableGraph is similar to Graph, but it keeps indices stable across removals.

GraphMap is an adjacency list graph which is backed by a hash table and the node identifiers are the keys into the table.

CSR is a sparse adjacency matrix graph with arbitrary associated data.

(We are currently using Graph , which uses adjacency lists). Incidentally, this representation allows for duplicated edges (i.e. parallel edges) so we are good there. However, there is an upper bound in the number of elements the graph can have. From the README:

Graph is parameterized over:

Associated data N for nodes and E for edges, called weights. The associated data can be of arbitrary type. Edge type Ty that determines whether the graph edges are directed or undirected. Index type Ix, which determines the maximum size of the graph.

I would bet that none of the other implementations suit us: they will either disallow parallel edges or put a limit on the number of elements.

Perhaps we should start by writing down all the operations we need to support from a graph (or at least the ones we can think of) and see if any other graph library fits our needs or think about writing a small, in-house one.

adinapoli-mndc commented 5 years ago

Just for curiosity, I did take a brief look at graphlib, one of the libraries linked in the Reddit discussion.

Pros:

Cons:

// Adding an edge is idempotent
graph.add_edge(&v1, &v2);
graph.add_edge(&v1, &v2);
graph.add_edge(&v1, &v2);

Perhaps we could look into forking this library, if we think it's easier to understand and less intimidating than petgraph.

adinapoli-mndc commented 5 years ago

@MeBrei Small update on this: I have reached out to the graphlib author and, while he is willing to accept pull requests adding multi-edge support to the library, he seems to be contrary in having the Graph type support anything which is not a f32 as a weight, which is fairly limiting for us.

I will experiment a bit more with this library to see how it is implemented internally and if it does put a limit on the number of nodes/edges one might have, and report back. If this shows potential, we can fork it and maintain our own version, I guess.