Performance improvements in edge_disjoint_spanning_trees for undirected graphs

shivensinha4 commented 2 years ago

Concern method sage.graphs.spanning_tree.edge_disjoint_spaaning_trees for undirected graphs introduced in #32911 as part of #32169.

edge_label is currently a stored as a dictionary with frozensets of edge endpoints as keys. Since our structure is a collection of forests, we can utilise the fact that each edge can be uniquely represented as the parent edge of some node. We can attempt to replace this dictionary with an integer array (even a C array for better performance, we don’t even need resizes or appends). Then, we can structure the code to perform lookups using node indices and trace back the augmenting paths using the same idea. The containers for storing parents (p), which is currently a list of dicts, can be similarly restructured.
We can also speed up the initialisation of the labels before each labelling step. Instead of initialising a new label container on each edge-independence check iteration, we can keep track of the last iteration that changed the label. When we need to access the labels anytime during our processes, we first check if the last edit time for this label was during this iteration, and lazily perform updates. The same array is reused throughout all edge-independence checks instead of new dictionaries, so this will also make it more memory efficient. Even using another array to store the edit time would be a nice idea, but we can go further and just hash the pairs of (time, label) into a single integer of (time * maximum_label_bound) + label

As @dcoudert pointed out, the cost of this approach is that the vertex labels can’t internally be of any hashable type - only integers. We’ll first need to relabel the vertices if they are of other types. We can then also go ahead and use a specific data structure to speed up iterations (e.g. static sparse graph).

CC: @dcoudert

Component: graph theory

Issue created by migration from https://trac.sagemath.org/ticket/33634

shivensinha4 commented 2 years ago

comment:1

If it sounds good, I'd love to work on implementing these ideas in a separate function.

dcoudert commented 2 years ago

comment:2

Fill free to give it a try.

I don't know if better algorithms have been proposed for undirected graphs. I have chosen to implement the Roskind-Tarjan algorithm without extra optimization to get something easy to maintain. Now that we have it, we can have other implementations and we are able to compare results and running times.

dcoudert commented 2 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,5 @@
+Concern method `sage.graphs.spanning_tree.edge_disjoint_spaaning_trees` for undirected graphs introduced in #32911 as part of #32169.
+
 - `edge_label` is currently a stored as a dictionary with frozensets of edge endpoints as keys. Since our structure is a collection of forests, we can utilise the fact that each edge can be uniquely represented as the parent edge of some node. We can attempt to replace this dictionary with an integer array (even a C array for better performance, we don’t even need resizes or appends). Then, we can structure the code to perform lookups using node indices and trace back the augmenting paths using the same idea.  The containers for storing parents (`p`), which is currently a list of dicts, can be similarly restructured.

 - We can also speed up the initialisation of the labels before each labelling step. Instead of initialising a new label container on each edge-independence check iteration, we can keep track of the last iteration that changed the label. When we need to access the labels anytime during our processes, we first check if the last *edit time* for this label was during this iteration, and lazily perform updates. The same array is reused throughout all edge-independence checks instead of new dictionaries, so this will also make it more memory efficient. Even using another array to store the *edit time* would be a nice idea, but we can go further and just hash the pairs of `(time, label)` into a single integer of `(time * maximum_label_bound) + label`

264c63e9-79e4-426c-ab2d-7ce38c7c4a27 commented 2 years ago

comment:4

This seems like something that could be ripe for a medium-size GSoC idea.

sagemath / sage

Performance improvements in edge_disjoint_spanning_trees for undirected graphs #33634