icl-utk-edu / heffte

BSD 3-Clause "New" or "Revised" License
20 stars 15 forks source link

improved sycl sync #17

Closed mkstoyanov closed 1 year ago

mkstoyanov commented 1 year ago
mkstoyanov commented 1 year ago

@G-Ragghianti Do we have an actual problem or spack simply hasn't been updated yet.

Since the spack PR was accepted, the CUDA memory problem should be gone.

G-Ragghianti commented 1 year ago

The fix that you proposed for the cuda backend problem has not been merged yet into spack develop. I'm still working to verify that it works to resolve our issue. However, that particular check can be ignored for the purposes of this PR. Do you want to include a ONEAPI on gpu_intel check along with this PR?

mkstoyanov commented 1 year ago

Yes, I need to check Intel GPU for this PR. It may be good to do it a few times since the user reported a race condition that doesn't appear every time. I can manually tell the test to run again, but it has to run on the actual hardware.

I'm not bothered by the CUDA+spack problem, just drop me a line when you merge the spack fix, so I know and don't ignore a legit problem.

mkstoyanov commented 1 year ago

PS: My CUDA+spack fix has been merged. The pending PR is for your MPI_HOME fix (I don't understand why they are taking so long).

G-Ragghianti commented 1 year ago

To add the ONEAPI GPU check back in, you'd modify the "heffte_gpu" section of the main.yaml file to be the following:

  heffte_gpu:
    strategy:
      matrix:
        maker: [cmake, spack]
        device: [gpu_nvidia, gpu_intel]
        exclude: # spack package doesn't support gpu_intel
          - maker: spack
            device: gpu_intel
      fail-fast: false
    runs-on: ${{matrix.device}}
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v3
      - name: Build
        run: .github/workflows/build-${{matrix.maker}}.sh build ${{matrix.device}}
      - name: Test
        run: .github/workflows/build-${{matrix.maker}}.sh test ${{matrix.device}}
      - name: SmokeTest
        run: .github/workflows/build-${{matrix.maker}}.sh smoketest ${{matrix.device}}
mkstoyanov commented 1 year ago

Just added it and I'm getting the same conflict that we had before, the OneAPI version coming with spack is not compatible with gcc-9.5.

How do we resolve that? Can we just run it in the oneapi container, that would be the proper way instead of relying on the spack installation.

mkstoyanov commented 1 year ago

@G-Ragghianti we can close this one for now, but I will still need access to Intel GPU, either with a manual account or as part of the CI. Both work for me.

G-Ragghianti commented 1 year ago

Just added it and I'm getting the same conflict that we had before, the OneAPI version coming with spack is not compatible with gcc-9.5.

How do we resolve that? Can we just run it in the oneapi container, that would be the proper way instead of relying on the spack installation.

Currently, the CI is loading whichever intel-oneapi-* module is installed on the container. You could in theory modify the section of build-cmake.sh to use the system-installed oneapi tools, but I would prefer to modify build-cmake.sh so that it will load a specific version that you need. Right now, it is using intel-oneapi-compilers@2023.0.0. What version do you need?

mkstoyanov commented 1 year ago

The intel-oneapi-compiler is the correct version, we have to use the most recent one as things keep changing a lot.

However, the compiler is complaining about something in the STL library provided by gcc-9.5. The conflict has nothing to do with heFFTe, it's in the system files. My guess is that oneAPI 2023 requires newer version of gcc.

I don't know how the cpu-only oneAPI version works, we had the error at one point, then I think it went away. I don't know what changed in the setting but we can compare to that one to see.

G-Ragghianti commented 1 year ago

We can load a different version of gcc (which?).  I'm surprised that intel-oneapi-* would have a problem with @.*** since that is the version of gcc that the intel-oneapi package was installed with (spack should have noted the incompatibility).

On 3/23/23 16:44, Miroslav Stoyanov wrote:

The |intel-oneapi-compiler| is the correct version, we have to use the most recent one as things keep changing a lot.

However, the compiler is complaining about something in the STL library provided by gcc-9.5. The conflict has nothing to do with heFFTe, it's in the system files. My guess is that oneAPI 2023 requires newer version of gcc.

I don't know how the cpu-only oneAPI version works, we had the error at one point, then I think it went away. I don't know what changed in the setting but we can compare to that one to see.

— Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/heffte/pull/17#issuecomment-1481871399, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7Q5Y5X7SF3MPBFANNQXM3W5SY3XANCNFSM6AAAAAAWELXWFE. You are receiving this because you were mentioned.Message ID: @.***>

mkstoyanov commented 1 year ago

gcc-11 should work fine.

Compatibility with STL is an issue across platforms and few people care about it.