Chain events for oneMKL back end

icl-utk-edu / heffte

BSD 3-Clause "New" or "Revised" License

20 stars 15 forks source link

Chain events for oneMKL back end #18

Closed mabraham closed 1 year ago

mabraham commented 1 year ago

This avoids potential races on the working buffers maintained inside the plan shared over all blocks.

Fixes #16

mabraham commented 1 year ago

Nice CI. Who's providing the runners?

mabraham commented 1 year ago

This passes all tests repeatedly, using dpcpp, MKL, and Intel MPI on a node with 4 PVC cards.

mkstoyanov commented 1 year ago

This is effectively a duplicate of #17 that I did yesterday. If that's all that's needed, we have the fix in two PRs.

The CI is provided by the University of Tennessee's Innovative Computing Lab(ICL-UTK), since this is an ICL project.

mkstoyanov commented 1 year ago

I'm still not sure how this fixes the issue when you say that options.use_reorder = true; doesn't fix the problem.

I'm working on getting an access to an Intel GPU, once I can do that, I'll get to the bottom of this.

mabraham commented 1 year ago

I'm still not sure how this fixes the issue when you say that options.use_reorder = true; doesn't fix the problem.

I'm working on getting an access to an Intel GPU, once I can do that, I'll get to the bottom of this.

use_reorder.true did seem to be effective most of the time. I was testing on a cluster with quite a few flaky aspects, so could well believe the failures were not due to HeFFTe.

G-Ragghianti commented 1 year ago

Nice CI. Who's providing the runners?

The runners are hosted here at ICL (UTK). We have the ability to run on nvidia, amd, and intel GPUs, but the intel GPU test is currently disabled. If you have changes that may affect oneapi on gpu, then it would be good to include the cmake-gpu_intel test in the PR (modification of .github/workflows/main.yaml).

mkstoyanov commented 1 year ago

@G-Ragghianti Thanks for fixing the gpu_intel problem!

mkstoyanov commented 1 year ago

There is more than one issue here, but at least we fixed one.