trilinos / Trilinos

Primary repository for the Trilinos Project
https://trilinos.org/
Other
1.2k stars 563 forks source link

Belos: BlockGmresSolMgr significantly slower on CUDA when not using UVM #12029

Closed ddement closed 2 months ago

ddement commented 1 year ago

@srajama1

We have been doing a series of experiments in Nalu-Wind regarding the removal of UVM from CUDA runs. When Trilinos is built without UVM, most regression tests run slightly faster than with it. However, one case runs approximately 30x slower when UVM is not used. In particular, this slowdown has been traced primarily to the "BlockGmresSolMgr total solve time" and "ICGS[2]: Ortho (Norm)" timing lines from Belos. Several other regression tests exercise other Belos solvers, and none of them show similar regressions.

Unfortunately, the reproducer for this case is a Nalu-Wind regression test - we do not have a more minimal problem. The regression test in question is the "taylorGreenVortex_p3" test. We can assist with running and debugging as necessary. @jhux2 may also have experience with running this case.

jhux2 commented 1 year ago

@trilinos/belos

jhux2 commented 1 year ago

@ddement #11837 is in progress to address GMRES orthogonalization in Belos. I'm wondering if you are running into the case that the PR is meant to address.

cgcgcg commented 1 year ago

Maybe related: #9979 ?

jhux2 commented 1 year ago

@ddement Do you know if the 30x slowdown is in the velocity, continuity, or both phases? It's been quite a while since I ran this test case.

ddement commented 1 year ago

@jhux2

It is used for the velocity solve. Continuity uses BiCGStab, which is a little slower (a couple seconds) on this particular run, but nowhere near the 30x number.

I'll take a look at the other PRs to see if I think they're related - I would not be surprised if they are.

github-actions[bot] commented 3 months ago

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label. If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE. If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

github-actions[bot] commented 2 months ago

This issue was closed due to inactivity for 395 days.