python-graphblas / graphblas-algorithms

Graph algorithms written in GraphBLAS
Apache License 2.0
76 stars 4 forks source link

Algorithm Request: MST (5.6 Algebraic Prim's) #97

Open rtbs-dev opened 2 months ago

rtbs-dev commented 2 months ago

Hi all! Still getting used to the graphblas bindings and writing efficient enough algorithms to contribute effectively, but I thought I'd put a placeholder issue up in case someone else already has progress on this.

I don't see any of the graphblas python bindings implementing Algebraic Prim's from ch. 5.2 in the original Graph Algorithms in the Language of Linear Algebra book. MST is really quite useful to me, but in general as an approximation to the Steiner Tree for a given set of nodes and their metric closure.

I'm fairly certain the text states we cannot take advantage of the priority queue/heap speedup in linalg method, but perhaps someone has an idea (since Prim's is theoretically O(1) for sufficiently dense graphs, i.e. complete graphs of the metric closure! yay!)

Let me know if there's other info desired here for the feature request. I'm excited for an alternative to the old scipy minimum_spanning_tree method, since it's spending a lot of time on nested graph validation that isn't opt-out.

Thanks!

DrTimothyAldenDavis commented 2 months ago

Thanks for the note.

We do have a minimum spanning forest algorithm in the LAGraph package, https://github.com/GraphBLAS/LAGraph/blob/stable/experimental/algorithm/LAGraph_msf.c . It doesn't require a priority queue.

It does have an unusual usage of some user-defined operators, which access a global pointer. That should be fixed in the C code so those pointers can at least be accessed via the 'thunk' scalar. Even that is a bit of a stretch; it's not something the GraphBLAS API talks about. It is safe though, as long as those pointers do not change until the matrix is materialized by either the operation using those pointers, or some later operation that materializes the matrix (like a GrB_wait). So it works for now.

I'm not sure how that would port to python though.

What we really need is a kind of "Cartesian select". See this issue in the API: https://github.com/GraphBLAS/graphblas-api-c/issues/67 . If that method were added then the msf algorithm would not need to create an operator that accesses a global pointer.