sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
276 stars 89 forks source link

weaver/pmgpu GPU builds failing #1079

Open jewatkins opened 1 week ago

jewatkins commented 1 week ago

Weaver failure started on: 9/12/24: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?buildid=212851 but then a new error came up around: 9/19/24: https://sems-cdash-son.sandia.gov/cdash/viewBuildError.php?type=0&buildid=218620 both seem to be related to extruded meshes.

Same issue on pmgpu although we don't have a long history: https://my.cdash.org/viewBuildError.php?buildid=2666842 the oldest was on 9/4/24: https://my.cdash.org/viewBuildError.php?type=0&buildid=2649217 and it's an intrepid2 error.

@bartgol are you looking at this? on weaver, it looks like it just has to do with extruded meshes but pmgpu might require a bit more debugging. @mcarlson801 might be able to help.

jewatkins commented 1 week ago

Note: 9af696318b3695c5ef0b07f4777fb28dc3152283 worked on weaver: https://sems-cdash-son.sandia.gov/cdash/test/7900603

bartgol commented 1 week ago

Uhm this is not the same as #1077, right? The build error is something we saw already a few weeks ago (and I fixed it in some other class, maybe SolutionManager?). I'll take a look at the rest too.

mperego commented 1 week ago

@jewatkins Are you sure that pmgpu is linking an updated version of Trilinos? Based on the Intrepid2 error it seems that it's linking an old version.

mcarlson801 commented 1 week ago

Perlmutter-gpu started failing due to an issue with SuperLU so I disabled it temporarily as a bandaid (wasn't sure if we used it on GPU or not). I'll need to dig into the pm-gpu tests more once I'm back in the office on Monday.

mcarlson801 commented 1 week ago

Woops, the failing tests are pm-cpu. I didn't disable SuperLU there. Hmmm

jewatkins commented 1 week ago

@jewatkins Are you sure that pmgpu is linking an updated version of Trilinos? Based on the Intrepid2 error it seems that it's linking an old version.

it looks like the intrepid2 errors isn't in the newer builds so maybe that's no longer an issue.