sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

Fix for E3SM-Mali uvm-free error #1070

Closed mcarlson801 closed 3 months ago

mcarlson801 commented 3 months ago

Tpetra host views were not going out of scope before being accessed on device in SDirichlet and SDirichletField evaluators. This fixes the error that was showing up in CUDA E3SM-Mali runs when UVM is disabled.

@mperego This error didn't show up in the handful of compass tests that I'm able to run on GPU. Is there a way to know when SDirichletField is used based on the Albany input or timer output?

mperego commented 3 months ago

Thanks. MALI hard codes the dirichlet conditions to be SDirichletField. However, this can be overwritten in the albany_input.yaml file with a line like DBC on NS dirichlet ...

Thanks for fixing this!

mperego commented 3 months ago

Does this mean that we have a UVM free E3SM-MALI build on Perlmutter GPU?

jewatkins commented 3 months ago

Does this mean that we have a UVM free E3SM-MALI build on Perlmutter GPU?

yes :)

I think we're still a bit confused on why Albany/MALI tests didn't catch this since we think they also use SDBC. For example, https://my.cdash.org/tests/184095086 errors out much later (not where e3sm was erroring out). Ideally, we would want to try to figure out how to reproduce this error in albany.

jewatkins commented 3 months ago

the input file that doesn't seem to reproduce (but probably should?): https://github.com/sandialabs/Albany/blob/master/tests/landIce/FO_GIS/input_fo_humboldt_muelu.yaml

mperego commented 3 months ago

Um. I don't know. One difference might be that E3SM we are solving for a full ice sheet, so the Dirichlet sideset should be empty, whereas most of MALI Albany tests are on glaciers with nonempty dirichlet sidesets.