Open ikalash opened 1 year ago
@cwsmith what versions of cuda are supported?
Hmmm. That check may be a bit conservative now that we have a 'pure' kokkos backend that doesn't rely on thrust; there were thrust bugs in some cuda releases. I'll run a test with the problematic cuda 11.2 and the new backend to confirm.
@jewatkins I'm running tests now (tracked here) and will keep you posted.
@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.
@ikalash maybe it's best just to turn off omega_h for this build for now since we'll likely transition off of weaver and onto blake. I can test omega_h + cuda there
We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.
How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.
Why are we turning off weaver? Does blake feature V100 as well? I thought it didn't... Since Summit's life got extended by a year, I think it's best to keep V100 tested somewhere, so if blake does not feature V100, we should prob keep weaver.
We're not turning off weaver yet, just disabling omega_h. There's issues with the new module set on weaver (I sank a lot of time on it last FY) and there are open tickets which have not been resolved. blake has H100. Plan is to keep weaver online for as long as summit is online or if it takes too much work to maintain.
We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.
How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.
It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.
It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.
Could you please try this @mcarlson801 ?
FYI, he's OOO this week
Thanks for reminding me @jewatkins . It is no rush.
@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.
@cwsmith can we remove (or tune better) the check on the version then?
@bartgol Yeah, I'm going to add this today to cmake and spack.
Please let me know when the fix is pushed and I can re-activate Omega_h in the Weaver nightlies.
Omega_h v10.8.3 has the fixed cuda check: https://github.com/SCOREC/omega_h/commit/40a2d36d0b747a7147aeed238a0323f40b227cb2 .
Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?
Ah, I missed this while I was out. I'll try turning it on for Perlmutter as well for this week's test.
Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?
that fix won't let us run w. omega_h on weaver since we're still on cuda 11.2
@jewatkins : you are right. Good call.
I turned on Omega_h in the weaver nightlies and it looks like it's not compatible with the CUDA library:
https://sems-cdash-son.sandia.gov/cdash/build/53415/configure
I presume we will just punt on turning on Omega_h on weaver, or is there a different plan?
@jewatkins @mcarlson801