Open cz4rs opened 1 year ago
My suggestion for priority is:
1. compilers and versions used for the build (already have from core)
2. Kokkos Core SHA (already avail), Kokkos Kernels SHA (make PR in kernels)
3. Google Benchmark SHA or version (make PR in kernels)
4. OpenMP environment variables for OpenMP benchmarks
(potentially to be done in CORE)
(from querying the environment variables, or querying the OpenMP API programmatically)
OMP_NUM_THREADS, OMP_DYNAMIC, OMP_PROC_BIND, OMP_PLACES
5. For CUDA (from parsing nvidia-smi, or using the CUDA runtime API query functions)
(potentially to be done in CORE)
Nvidia driver version (probably taken from nvidia-smi)
GPU name / model / revision (enough to unambiguously identify the SKU, this is more than just the GPU architecture)
6. For AMD / Intel GPUs: analogous to CUDA (potentially to be done in CORE)
@lucbv @cwpearson are you ok with the priority above?
Sure that sounds reasonable. I would like to point out that some of the features listed above are implemented (partially and/or fully) here: https://github.com/kokkos/kokkos-tools/blob/ecp-kpp3/profiling/kpp3-verifier/kp_ecp_kpp3.cpp see this function for example: extract_gpuinfo()
Originally posted by @cwpearson in https://github.com/kokkos/kokkos-kernels/issues/1636#issuecomment-1405289099
Include more information in benchmark context.
Relevant part of the current JSON output.
```json { "context": { "date": "2023-03-02T19:19:26+01:00", "host_name": "perrinel-MS-7C75", "executable": "/home/perrinel/Dev/kokkos/build_kernel_benchmark/perf_test/KokkosKernels_PerformanceTest_Benchmark", "num_cpus": 20, "mhz_per_cpu": 5300, "cpu_scaling_enabled": true, "caches": [ { "type": "Data", "level": 1, "size": 32768, "num_sharing": 2 }, { "type": "Instruction", "level": 1, "size": 32768, "num_sharing": 2 }, { "type": "Unified", "level": 2, "size": 262144, "num_sharing": 2 }, { "type": "Unified", "level": 3, "size": 20971520, "num_sharing": 20 } ], "load_avg": [3.13,1.83,0.87], "library_build_type": "debug", "CPU architecture": "none", "Default Device": "N6Kokkos6SerialE", "GPU architecture": "none", "KOKKOSKERNELS_ENABLE_TPL_ARMPL": "no", "KOKKOSKERNELS_ENABLE_TPL_BLAS": "no", "KOKKOSKERNELS_ENABLE_TPL_CBLAS": "no", "KOKKOSKERNELS_ENABLE_TPL_CHOLMOD": "no", "KOKKOSKERNELS_ENABLE_TPL_CUBLAS": "no", "KOKKOSKERNELS_ENABLE_TPL_CUSPARSE": "no", "KOKKOSKERNELS_ENABLE_TPL_LAPACK": "no", "KOKKOSKERNELS_ENABLE_TPL_LAPACKE": "no", "KOKKOSKERNELS_ENABLE_TPL_MAGMA": "no", "KOKKOSKERNELS_ENABLE_TPL_METIS": "no", "KOKKOSKERNELS_ENABLE_TPL_MKL": "no", "KOKKOSKERNELS_ENABLE_TPL_ROCBLAS": "no", "KOKKOSKERNELS_ENABLE_TPL_ROCSPARSE": "no", "KOKKOSKERNELS_ENABLE_TPL_SUPERLU": "no", "KOKKOS_COMPILER_GNU": "940", "KOKKOS_ENABLE_ASM": "yes", "KOKKOS_ENABLE_CXX17": "yes", "KOKKOS_ENABLE_CXX20": "no", "KOKKOS_ENABLE_CXX23": "no", "KOKKOS_ENABLE_DEBUG_BOUNDS_CHECK": "no", "KOKKOS_ENABLE_GNU_ATOMICS": "no", "KOKKOS_ENABLE_HBWSPACE": "no", "KOKKOS_ENABLE_HWLOC": "no", "KOKKOS_ENABLE_INTEL_ATOMICS": "no", "KOKKOS_ENABLE_INTEL_MM_ALLOC": "no", "KOKKOS_ENABLE_LIBDL": "yes", "KOKKOS_ENABLE_LIBRT": "no", "KOKKOS_ENABLE_PRAGMA_IVDEP": "no", "KOKKOS_ENABLE_PRAGMA_LOOPCOUNT": "no", "KOKKOS_ENABLE_PRAGMA_UNROLL": "no", "KOKKOS_ENABLE_PRAGMA_VECTOR": "no", "KOKKOS_ENABLE_SERIAL": "yes", "KOKKOS_ENABLE_SERIAL_ATOMICS": "no", "KOKKOS_ENABLE_WINDOWS_ATOMICS": "no", "Kokkos Version": "4.0.99", "KokkosKernels Version": "4.0.99" }, "benchmarks": [ (benchmark results...) ] } ```Kokkos::print_configuration
)yes
/no
already available inKokkosKernels::print_configuration
)Providing precise version information: https://github.com/kokkos/kokkos-kernels/pull/1693
GOOGLE_BENCHMARK_VERSION: 1.6.2
(potentially to be done in Kokkos Core) (from querying the environment variables, or querying the OpenMP API programmatically)
OMP_NUM_THREADS
OMP_DYNAMIC
OMP_PROC_BIND
OMP_PLACES
1789
Kokkos::print_configuration
:[numa_count x core_per_numa x thread_per_core]
, see relevant code in Kokkos Corenvidia-smi
)