pika-org / pika

pika builds on C++ std::execution with fiber, CUDA, HIP, and MPI support.
https://pikacpp.org
Boost Software License 1.0
62 stars 10 forks source link

Move GPU CI pipelines from old daint to new daint #1239

Open msimberg opened 1 week ago

msimberg commented 1 week ago
codacy-production[bot] commented 1 week ago

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
:white_check_mark: +0.03% (target: -1.00%) :white_check_mark: (target: 90.00%)
Coverage variation details | | Coverable lines | Covered lines | Coverage | | ------------- | ------------- | ------------- | ------------- | | Common ancestor commit (17f3c6fa3d50aa4babc201f032086e4730f1a985) | 18346 | 13774 | 75.08% | | | Head commit (3d5dc6a5940b82aea1aa5a3608bf73c67a157abc) | 18346 (+0) | 13780 (+6) | 75.11% (**+0.03%**) | **Coverage variation** is the difference between the coverage for the head and common ancestor commits of the pull request branch: ` - `
Diff coverage details | | Coverable lines | Covered lines | Diff coverage | | ------------- | ------------- | ------------- | ------------- | | Pull request (#1239) | 0 | 0 | **∅ (not applicable)** | **Diff coverage** is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: `/ * 100%`

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

msimberg commented 1 week ago

Exporting NVIDIA_VISIBLE_DEVICES=all and NVIDIA_DRIVER_CAPABILITIES="compute,utility" seems to be what was required to get the container images to load the correct drivers etc. and avoid

cudaErrorInsufficientDriver (CUDA driver version is insufficient for CUDA runtime version)

These are from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#constraints.

These work when testing manually, but don't seem to be work in CI yet.

msimberg commented 3 days ago

All right, we're making some progress:

I may end up disabling the test steps for the latter two in this PR to reenable them in separate PRs.

msimberg commented 2 days ago

The clang/cuda configuration with valgrind no longer complains about illegal instructions: good. It now reports many issues, which I don't know yet if they're real or not.

I'll aim to get the GCC 12/CUDA 12 pipeline running properly (still some tweaks needed on the CSCS CI side apparently) and then I'll attempt to revive the two other CUDA configurations separately, possibly introducing another valgrind configuration on x86.