trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 564 forks source link

[MueLu] Restrict communicator when a rank is empty and coarser levels are needed #1466

Closed lucbv closed 3 years ago

lucbv commented 7 years ago

@trilinos/muelu As discussed during the stand-up meeting, at the moment when a rank is empty it is considered to have reached its coarsest level and it hangs until the rest of the Hierarchy is being built. This means that any collective being called to build a coarser level will fail unless rebalancing is requested. At the moment geometric coarsening is not able to use rebalancing as it could lead to bad reordering of the coordinates and equations, which means that when a rank is empty MueLu hangs without explanations.

tawiesn commented 7 years ago

@lucbv Just a note: yesterday i found a problem when calling Epetra_Map::RemoveEmptyProcessors. It seems that this routine calls MPI_Comm_split but the communicators are not freed properly. Could also be a problem in the Xpetra wrapper. Have not checked the Tpetra side. You won't notice that immediately, but if communicators are not freed properly your program will crash after a while (if more than 32768 Comm objects exist). You will see the crashes only with large scale long-run simulations. So, we probably should carefully revisit these routines.

jhux2 commented 7 years ago

Just a note: yesterday i found a problem when calling Epetra_Map::RemoveEmptyProcessors. It seems that this routine calls MPI_Comm_split but the communicators are not freed properly. Could also be a problem in the Xpetra wrapper. Have not checked the Tpetra side.

@tawiesn Good catch. Do you have a fix available by any chance?

github-actions[bot] commented 3 years ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.

github-actions[bot] commented 3 years ago

This issue was closed due to inactivity for 395 days.