sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

weaver/pmgpu GPU builds failing #1079

Closed jewatkins closed 1 month ago

jewatkins commented 2 months ago

Weaver failure started on: 9/12/24: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?buildid=212851 but then a new error came up around: 9/19/24: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?type=0&buildid=218620 both seem to be related to extruded meshes.

Same issue on pmgpu although we don't have a long history: https://my.cdash.org/viewBuildError.php?buildid=2666842 the oldest was on 9/4/24: https://my.cdash.org/viewBuildError.php?type=0&buildid=2649217 and it's an intrepid2 error.

@bartgol are you looking at this? on weaver, it looks like it just has to do with extruded meshes but pmgpu might require a bit more debugging. @mcarlson801 might be able to help.

jewatkins commented 2 months ago

Note: 9af696318b3695c5ef0b07f4777fb28dc3152283 worked on weaver: https://sems-cdash-son.sandia.gov/cdash/test/7900603

bartgol commented 2 months ago

Uhm this is not the same as #1077, right? The build error is something we saw already a few weeks ago (and I fixed it in some other class, maybe SolutionManager?). I'll take a look at the rest too.

mperego commented 2 months ago

@jewatkins Are you sure that pmgpu is linking an updated version of Trilinos? Based on the Intrepid2 error it seems that it's linking an old version.

mcarlson801 commented 1 month ago

Perlmutter-gpu started failing due to an issue with SuperLU so I disabled it temporarily as a bandaid (wasn't sure if we used it on GPU or not). I'll need to dig into the pm-gpu tests more once I'm back in the office on Monday.

mcarlson801 commented 1 month ago

Woops, the failing tests are pm-cpu. I didn't disable SuperLU there. Hmmm

jewatkins commented 1 month ago

@jewatkins Are you sure that pmgpu is linking an updated version of Trilinos? Based on the Intrepid2 error it seems that it's linking an old version.

it looks like the intrepid2 errors isn't in the newer builds so maybe that's no longer an issue.

jewatkins commented 1 month ago

These are fixed now. Thanks Luca!