sandialabs / Albany

Sandia National Laboratories' Albany multiphysics code
Other
282 stars 89 forks source link

Omega_h not compatible with CUDA on Weaver #999

Open ikalash opened 1 year ago

ikalash commented 1 year ago

I turned on Omega_h in the weaver nightlies and it looks like it's not compatible with the CUDA library:

CMake Error at CMakeLists.txt:136 (message):
  CUDA 11.2 does not support Omega_h, use an older or newer version

-- Configuring incomplete, errors occurred!
See also "/projects/albany/nightlyCDashWeaver/build/AlbBuild/tpls/omegah/Omega_h-prefix/src/Omega_h-build/CMakeFiles/CMakeOutput.log".
gmake[2]: *** [CMakeFiles/Omega_h.dir/build.make:92: Omega_h-prefix/src/Omega_h-stamp/Omega_h-configure] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/Omega_h.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

CMake Error at cmake/GetOrInstallOmegah.cmake:115 (message):
  Die
Call Stack (most recent call first):
  CMakeLists.txt:753 (include)

https://sems-cdash-son.sandia.gov/cdash/build/53415/configure

I presume we will just punt on turning on Omega_h on weaver, or is there a different plan?

@jewatkins @mcarlson801

jewatkins commented 1 year ago

@cwsmith what versions of cuda are supported?

cwsmith commented 1 year ago

Hmmm. That check may be a bit conservative now that we have a 'pure' kokkos backend that doesn't rely on thrust; there were thrust bugs in some cuda releases. I'll run a test with the problematic cuda 11.2 and the new backend to confirm.

cwsmith commented 1 year ago

@jewatkins I'm running tests now (tracked here) and will keep you posted.

cwsmith commented 1 year ago

@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.

jewatkins commented 1 year ago

@ikalash maybe it's best just to turn off omega_h for this build for now since we'll likely transition off of weaver and onto blake. I can test omega_h + cuda there

ikalash commented 1 year ago

We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.

How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.

bartgol commented 1 year ago

Why are we turning off weaver? Does blake feature V100 as well? I thought it didn't... Since Summit's life got extended by a year, I think it's best to keep V100 tested somewhere, so if blake does not feature V100, we should prob keep weaver.

jewatkins commented 1 year ago

We're not turning off weaver yet, just disabling omega_h. There's issues with the new module set on weaver (I sank a lot of time on it last FY) and there are open tickets which have not been resolved. blake has H100. Plan is to keep weaver online for as long as summit is online or if it takes too much work to maintain.

jewatkins commented 1 year ago

We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.

How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.

It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.

ikalash commented 1 year ago

It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.

Could you please try this @mcarlson801 ?

jewatkins commented 1 year ago

FYI, he's OOO this week

ikalash commented 1 year ago

Thanks for reminding me @jewatkins . It is no rush.

bartgol commented 1 year ago

@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.

@cwsmith can we remove (or tune better) the check on the version then?

cwsmith commented 1 year ago

@bartgol Yeah, I'm going to add this today to cmake and spack.

ikalash commented 1 year ago

Please let me know when the fix is pushed and I can re-activate Omega_h in the Weaver nightlies.

cwsmith commented 1 year ago

Omega_h v10.8.3 has the fixed cuda check: https://github.com/SCOREC/omega_h/commit/40a2d36d0b747a7147aeed238a0323f40b227cb2 .

ikalash commented 1 year ago

Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?

mcarlson801 commented 1 year ago

Ah, I missed this while I was out. I'll try turning it on for Perlmutter as well for this week's test.

jewatkins commented 1 year ago

Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?

that fix won't let us run w. omega_h on weaver since we're still on cuda 11.2

ikalash commented 1 year ago

@jewatkins : you are right. Good call.