kokkos / kokkos-kernels

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
Other
304 stars 96 forks source link

.github/workflows: Add remaining spr and bdw checks #2321

Closed e10harvey closed 4 days ago

e10harvey commented 1 month ago
e10harvey commented 1 month ago

@ndellingwood: Can you please look into the "Exception: Illegal" errors in these new spr checks? Note that the spr check is testing against kokkos v4.3.01.

ndellingwood commented 1 month ago

@e10harvey I'm guessing the spr check corresponds to this nightly job? https://jenkins-son.sandia.gov/view/KokkosKernels/job/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_OpenMP_Serial_SPR-oneMKL

That job tests against kokkos@develop branch on Blake ("all" queue, though the H100 nodes should also have spr for host builds), these are the most relevant details of the job regarding environment and configuration:

module load git cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0 intel-oneapi-tbb/2021.9.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-openmp --with-serial --arch=SPR --compiler=icpx --cxxflags="-fp-model=precise" --with-scalars=double,complex_double,float,complex_float --with-ordinals=int,int64_t --with-offsets=int,size_t --with-tpls=mkl --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF --kokkos-path=$KOKKOS_PATH

The nightly job is passing, I'm not sure what the discrepancy is between the Jenkins job and the one running from the container

e10harvey commented 1 month ago

@e10harvey I'm guessing the spr check corresponds to this nightly job? https://jenkins-son.sandia.gov/view/KokkosKernels/job/KokkosKernels_Nightly_Blake_OneAPI_2023_1_0_OpenMP_Serial_SPR-oneMKL

That job tests against kokkos@develop branch on Blake ("all" queue, though the H100 nodes should also have spr for host builds), these are the most relevant details of the job regarding environment and configuration:

module load git cmake intel-oneapi-compilers/2023.1.0 intel-oneapi-dpl/2022.1.0 intel-oneapi-mkl/2023.1.0 intel-oneapi-tbb/2021.9.0

$KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-openmp --with-serial --arch=SPR --compiler=icpx --cxxflags="-fp-model=precise" --with-scalars=double,complex_double,float,complex_float --with-ordinals=int,int64_t --with-offsets=int,size_t --with-tpls=mkl --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF --kokkos-path=$KOKKOS_PATH

The nightly job is passing, I'm not sure what the discrepancy is between the Jenkins job and the one running from the container

Yes, that's correct. The cm_test_all_sandia invocation is:

          ../kokkos-kernels/cm_generate_makefile.bash \
            --with-openmp \
            --with-serial \
            --arch=SPR \
            --compiler=icpx \
            --cxxflags="-fp-model=precise" \
            --with-tpls=mkl \
            --kokkos-cmake-flags=-DKokkos_ENABLE_ONEDPL=OFF \
            --kokkos-path=$PWD/../kokkos
e10harvey commented 2 weeks ago

@lucbv, @ndellingwood: I believe this one is ready for another review and then merge. I have commented out two of the GNU1020 builds due to resource contention.