Minimum Spanning Tree for labelled graphs

TheChouzanOne commented 5 years ago

I want to start thinking how to generate a Minimum Spanning Tree. However, a big problem is stopping me.

Both Kruskall and Prim assume that the graph is connected, however, it doesn't seem to be a way to know if that condition holds, or a type safe representation of connected graphs.

I believe that a simple typesafe representation could be implemented for connected labelled graphs by tweaking the way connect zero x y works, as if vertexSet x∩vertexSet y=Ø, then the resulting graph WILL be disconnected, but if x and y share at least 1 vertex (assuming x and y are connected), then the graph will still be connected.

On the other hand, MST algorithm could just try to add as much edges as possible to connect the vertices, but could generate a disconnected graph (a Forest).

What is the optimal path to tackle this problem?

TheChouzanOne commented 5 years ago

After thinking about the simple typesafe representation. Even given that property, we need to stop the user of the API to try and overlay two graphs where their vertexSets intersect in at least one vertex. Implementing a new type (ConnectedAdjacencyMap for example) wouldn't stop that, and using Maybe ConnectedAdjacencyMap is not an acceptable solution.

Maybe just trying to compute a MST with as much edges as possible would be best then.

snowleopard commented 5 years ago

@TheChouzanOne I think it's useful to have both the MST algorithm returning Maybe (Tree a) as well as minimum spanning forest algorithm returning Forest a. The former will probably be implemented via the latter as follows:

mst :: (Ord a, Ord e) => AdjacencyMap e a -> Maybe (Tree a)
mst x = case msf x of
    [t] -> Just t
    _   -> Nothing

msf :: (Ord a, Ord e) => AdjacencyMap e a -> Forest a
msf = ...

One aspect which is not clear to me is how to capture edge labels in the resulting trees and forests. Perhaps, we need custom data types for labelled trees and forests, i.e. Tree e a and Forest e a.

TheChouzanOne commented 5 years ago

@snowleopard I totally forgot about the lack of labels for Data.Tree and Data.Forest.

The first idea that comes to my mind is to slightly change the structure of Tree's subForest like so:

Node
    rootLabel :: a
    subForest :: [(e, Forest a)]

type Forest e a = Tree e a

where subForest would stop being a Forest, so its name might be changed. It now represents a list of tuples, where the first element is the label of the connection between rootLabel and the Forest's rootLabel.

I am not sure how difficult it would be to mimic Data.Trees module for this new type, but if you think the structure makes enough sense, I could start working on it. Or, use another existent data structure to represent the result of mst like a normal AdjancencyMap e a.

TheChouzanOne commented 5 years ago

Update: I implemented a simple starting kruskall algorithm which can handle connected and unconnected graphs. The thing is that it just returns an AdjacencyMap e a, as I haven't thought about how to represent a tree or forest with edge labels.

The current API for the function is

kruskall :: (Monoid e, Ord e, Ord a, Eq a) => (e -> e -> Ordering) -> AdjacencyMap e a -> AdjacencyMap e a

It takes a function to determine how to choose edges and an AdjacencyMap e a to apply the algorithm to.

If you want to take a closer look, here's the code.

snowleopard commented 5 years ago

@TheChouzanOne Yes, I think we'll need to add a module like Data.Tree.Labelled with the definitions for labelled trees and forests. The msf algorithm will then need to return a Forest e a.

I don't understand why you ask both for dictionary Ord e and a comparison function e -> e -> Ordering. A dictionary alone should be sufficient.

TheChouzanOne commented 5 years ago

@TheChouzanOne Yes, I think we'll need to add a module like Data.Tree.Labelled with the definitions for labelled trees and forests. The msf algorithm will then need to return a Forest e a.

I will start thinking about that concern then.

I don't understand why you ask both for dictionary Ord e and a comparison function e -> e -> Ordering. A dictionary alone should be sufficient.

This function is just a starting point and might help to build the actual mst and msf functions. About the comparison function, it wast an idea to leave it kind of general to implement more functions on top of it. For example, maybe for some reason, a user would want a "maximum spanning tree" even though he is using distances (I don't really know if this is pausible), so he could call a more general generalMsf which takes a sorting function of his choice to decide which edges should be taken, but I wasn't sure if it was a good idea.

snowleopard / alga

Minimum Spanning Tree for labelled graphs #187