Open samuelpmish opened 5 months ago
Oh, also when I ran pytest test_cholesky.py
, only a small fraction of the tests were not skipped. Those tests that ran passed, but I can't verify that the remaining tests are working correctly.
A first question: enabling a supernodal decomposition on the CPU makes sense, but what does this mean for the CUDA implementation? I suppose that the kernels would need to be updated to deal with this case. Does it even make sense to use such a strategy on the GPU?
Enabling a supernodal decomposition on the CPU makes sense, but what does this mean for the CUDA implementation? ... Does it even make sense to use such a strategy on the GPU?
From the the original talk about GPU-accelerated CHOLMOD, it sounds like supernodal was the only supported factorization strategy. I'm not sure if that's still true today or not. If so, I wonder what CHOLMOD does if you tell it to use the GPU and also specify the simplicial strategy.
I suppose that the kernels would need to be updated to deal with this [supernodal option].
Which kernels need updating?
When I run pytest test_cholesky.py
on a machine with an NVIDIA GPU (where 17/20 of the tests are not skipped, and run successfully), nvprof
reveals that none of the linear algebra in cholespy
is using the GPU, except for the triangular solves.
CHOLMOD's recent user guide says that only cholmod_l_factorize
(for 64-bit integers) supports GPU acceleration, but cholespy
uses the 32-bit integer variant which doesn't seem to take advantage of GPU hardware:
am I goofing something up?
Hi,
After reading about the significant performance difference between simplicial and supernodal factorization (https://github.com/rgl-epfl/cholespy/issues/25), I thought it would be beneficial to expose that as an option to the user. This PR adds a new argument to the CholeskySolver class to specify which option to use. CHOLMOD also apparently has an "AUTO" option which uses some heuristic to decide which strategy is appropriate for the provided matrix, so I use that as the default argument.
Since the new argument is defaulted, this change shouldn't break existing users' code, although it is potentially a change in the default behavior of CHOLMOD, which could change the runtime of existing code.
The performance of two test problems (as measured on a M2 macbook) are summarized below,
get_icosphere(7)
(2D connectivity): supernodal was a 1.7x speedup over simplicialIn both cases, the default "CHOLMOD_AUTO" option picked the faster (supernodal) approach.
The small performance test script (and relevant data for the tetrahedron mesh) is available here.