Closed reed-foster closed 7 months ago
Hi Reed,
Thanks for reporting this. scipy sparse arrays and scipy sparse matrices should be interchangeable in this context, so it makes sense that your fix worked. I chose to use the sparse array type in pyTDGL because that's what's recommended by scipy (see note here). This check in PyPardiso is unnecessarily restrictive. I opened a PR in PyPardiso to address this https://github.com/haasad/PyPardiso/issues/68.
Other people have also reported that the MKL pardiso solver is not any faster than SuperLU despite being multithreaded, so you're probably better off just using SuperLU. If you have access to an NVIDIA GPU, using the GPU + SuperLU is the fastest combination I have found. If you're interested my testing is here https://github.com/loganbvh/py-tdgl/issues/34#issuecomment-1732427524
Hi Logan,
Thanks, that makes sense. After doing some further testing with the quickstart example, it definitely seems like SuperLU is the best choice for this geometry/mesh. Interestingly enough, using my NVIDIA GPU seems to slow things down for the quickstart example (4,671 mesh sites). When I monitor the GPU with nvidia-smi
, it seems like the GPU utilization is rather low (typically around 10% utilization; not much more than when loading a webpage e.g., except when everything is solved on the GPU with CUPY where it reaches >90% utilization). I guess this is because the mesh is relatively small?
Here's the mesh information:
{
'num_sites': 4671,
'num_elements': 8748,
'min_edge_length': 0.037649046290826015,
'max_edge_length': 0.251608033926911,
'mean_edge_length': 0.14058280371468113,
'min_area': 0.0008161429472105218,
'max_area': 0.035097437653613554,
'mean_area': 0.016480442066212443,
'coherence_length': 0.5,
'length_units': 'um',
}
For such a small mesh, you're almost definitely limited by overheads related to CPU/GPU synchronization and data transfer. In the all CPU case each call to TDGLSolver.update()
only takes 1 ms, so even if CPU/GPU sync and data transfer only takes a fraction of a ms, using the GPU ends up not being worth it. My other tests (https://github.com/loganbvh/py-tdgl/issues/34#issuecomment-1732427524) showed ~30% speedup with GPU + SuperLU for meshes of size 27,000 and 78,000. The exact speedup (or slowdown) probably also depends on the specific CPU, GPU, and memory hardware
Ah that makes sense. Thanks!
Closing via https://github.com/loganbvh/py-tdgl/pull/75
I followed the instructions for installation (installing through PyPI) and ran through the quickstart. I ran the testing suite and all of the tests passed. However, I noticed a TypeError when I tried to change the solver type to use PyPardiso (which I installed using
conda install -c conda-forge pypardiso
):Here's the stack trace:
It looks like when
mu_laplacian
is generated, it gets generated as acsc_array
, but PyPardiso requires acsc_matrix
.If the following section is modified: https://github.com/loganbvh/py-tdgl/blob/ac8b2d9e07b9c681fe6c72fb04fd0dbbbd856840/tdgl/finite_volume/operators.py#L300-L301
Then the simulation runs fine (although it's not any faster than using the default SparseLU solver, but perhaps that's just because of the structure of the example simulation geometry).