Refactor algorithm.rs - Githubissues

adinapoli-mndc commented 5 years ago

Once #33 will be completed, I think this will be a good time for us to take stock and trying to see if we can refactor algorithm.rs. There are a number of things which are currently a bit clunky in that module:

We are not using (not even attempting to) the GraphAlgorithm trait we spec-ed out in the end2end document. I am not saying we necessarily should, but trying to actually implement a concrete instance for it (which uses osrank_naive under the hood) would be a good testbed to understand whether or not the GraphAlgorithm trait has a leg to stand on. Otherwise, that's just a design exercise in a vacuum without any practical application, and it would be even misleading for people trying to implement the storage-specific parts of it (like @kim);
We are using that slightly weird get_weight and from_osrank closures that I do wonder how long will be before they will bite us in the back;
We are not trying to validate the ideas of the GraphAPI Working Group by trying to have a GraphDataWriter trait that modifies the graph metadata at the end of the whole process;
Our Graph trait is currently quite heavy (in includes 11 methods and counting) which means we cannot define other implementations for it. If the Graph trait was modelled similar to the one sketched in the working group by @cloudhead , we would be able to implement a impl Graph for GraphView where GraphView could just be a set of references to some nodes, we could later to traverse the graph efficiently (with no copy needed).

My tentative proposal would be the following:

We give it a shot to bring our current graph.rs traits closer to what the working group have created so far; as there is a bit of uncertainty on the entire business of layers and/or the GraphObject, we don't need to get it right immediately; probably starting by having a Graph, GraphWriter and GraphDataWriter should be enough;
Introduce the concept of a GraphView which is just a set of references to some node ids, plus an immutable reference to the underlying graph, so that we can easily implements things like subgraph_by_nodes without cloning;
Try to bring into the mix the concept of GraphAlgorithm and MutableGraphAlgorithm. Ideally the seed set phase would yield a GraphView and can be implemented as a GraphAlgorithm, whereas the actual osrank_naive would need to be a MutableGraphAlgorithm that also depends on a GraphDataWriter of some sort;

A note on RandomWalks

I have realised at some point this week, while I was working on the RandomWalks new abstraction, that for this data structure we have to copy the node ids. This is because we cannot rely on the lifetime of the input Graph G, because in the real world(TM) we would be given some form of serialised-from-disk RandomWalks type, and therefore this needs to own its content: not just from an ownership perspective, but also because by the time we receive this data structure as input, the network topology might have changed: new nodes might have been added, and stale nodes (still referenced by the RandomWalks) might be gone, therefore we cannot rely on the Rust lifetime system here.

Not that I am writing this, I am also thinking that perhaps RandomWalks should be somehow abstracted over as a trait, because it's unlikely that the storage & protocol teams will depend on our own type: we will probably be given some form of iterator or some more specific data structure and we will have to either perform the conversion on-the-fly, or use RandomWalks as it is now only for tests.

@MeBrei thoughts?

MeBrei commented 5 years ago

:+1: A good list! Though some of this might be heavily influenced by decisions of the Graph API. Additionally you could list switching to a multi-edge graph as outlined in the data model document.

adinapoli-mndc commented 5 years ago

Fixed by https://github.com/oscoin/osrank-rs/pull/59 .

oscoin / osrank-rs

Refactor algorithm.rs #42

A note on RandomWalks