Closed JamesMcCullochDickens closed 1 month ago
It's not clear to me that you are doing anything wrong, per se. I can suggest some improvements for you and observe some limitations of our implementation.
Suggestion for you, there is a dense_hungarian
method in our cugraph
API that might be a bit more efficient. It takes a flattened 2-D matrix representation as input along with the number of rows and columns. Ultimately our implementation is going to convert the sparse matrix you constructed in the cugraph.Graph
object and convert it back into a dense matrix... so by taking your 2D matrix and converting it into cugraph.Graph
you're actually paying the cost to convert between data structures that is unnecessary. The API you are calling is designed for if you already have a cugraph.Graph
object for some other algorithms. That might save you a little bit of cost and memory.
I'm not sure of the exact representation of your data. The Date/Nagi implementation of the Hungarian method that we have incorporated into cugraph is more efficient on integer weights. If you either have integer weights or can accept the loss of precision by converting to integer weights (multiply by some value and truncate/round to an integer) you might get better efficiency. There are multiple algorithms for implementing the Hungarian method, I believe that the scipy version is based on Jonker-Volgenant which will have different strengths.
Regarding limitations in the cugraph implementation, we haven't done any performance tuning of our python wrapper for this algorithm. In a simple test trying to replicate your issue I was able to observe some inefficiencies in our python layer (mostly extra data copying) that is occurring that will add some overhead to the call. Our C++ implementation supports a feature for doing multiple assignment computations at once (pass multiple dense matrices in and the algorithm will compute multiple Hungarian assignments concurrently). This would potentially benefit you. In a 1K by 1K matrix, several of the Hungarian steps are going to leave the GPU starving for parallel work. However, we don't expose this through our python API.
Unfortunately, we don't currently have plans on improving the Hungarian algorithm. After implementing this, we observed that cugraph is really focused on sparse graph algorithms, and the Hungarian algorithm is really a dense graph algorithm. We haven't deprecated the functionality at this point... but we have discussed that in the past.
@ChuckHastings
Thanks so much for the detailed response. I will try to do batches of graphs, the integer weights, and other things you mentioned and share the code here. Other than that, my hacky solution to my issue so far is to order the point clouds by z order, then do "windowed" hungarian matchings on the windows with scipy, which is fast enough for my use case albeit not as optimal, but thats another story. The issue can be closed.
I have to compute a lot of hungarian matchings between sets of points using their distance as the matching criterion. So far I have tried this hybrid Scipy (CPU) and PyTorch method:
As well as using cugraph and pylibraft
I was expecting the cugraph approach to be much faster, but it is actually significantly slower. For instance for two sets of points of shape (1024, 3), the scipy approach takes about 0.03 seconds, but the cugraph approach takes about 1.3 seconds on my RTX 4090. I'm wondering if there is anything I am doing wrong to account for this?