william-dawson / NTPoly

A massively parallel library for computing the functions of sparse matrices.
https://william-dawson.github.io/NTPoly/
MIT License
20 stars 10 forks source link

MPICH + OPENMP_NUM_THREADS=1 hangs #143

Closed william-dawson closed 3 years ago

william-dawson commented 3 years ago

Idea: maybe setting OPENMP_NUM_THREADS=1 also sets the max threads to 1 in mpich. Maybe we should just raise an error in this case.

Dankomaister commented 3 years ago

Any progress on solving this? This is a real issue when running DFTB+ with ELSI (NTPoly) since then we are forced to set OMP_NUM_THREADS=1 which then causes NTPoly to freeze.

/Daniel

william-dawson commented 3 years ago

Thank you for your feedback @Dankomaister. I agree that it is an issue, and when using DFTB+ with NTPoly I personally resorted to modifying DFTB+'s source code (see: https://william-dawson.github.io/NTPoly/blog/update/2021/05/14/dftb.html). I will use your comments as motivation, take a look at it this week, and hopefully get it sorted out!

Edit: I now at least one thing that should fix it, but the time consuming part will be to make sure it doesn't mess with the performance.

bhourahine commented 3 years ago

@william-dawson could you feed the threading issue back up-stream? From memory, the reason for disabling it was a claim that PEXSI(?) should only be run with one thread.

william-dawson commented 3 years ago

This should be fixed by #170 and the newly released version 2.7.0, so I am going to close this issue. The next steps are to update ELSI and then to update DFTB+.

william-dawson commented 3 years ago

DFTB+ has now addressed this issue: https://github.com/dftbplus/dftbplus/pull/854 On the ELSI side, I've submitted a pull request with the fix implemented in #170: https://gitlab.com/elsi_project/elsi_interface/-/merge_requests/281