icl-utk-edu / papi

Other
87 stars 39 forks source link

Unable to build the papi cuda component with cuda 12.4 #192

Open wcohen opened 1 month ago

wcohen commented 1 month ago

When attempting to build cuda component on RHEL9.4 one is able to build the cuda component with Cuda 12.3. However, when Cuda 12.4 is used for the build it fails.

To reproduce have nvidia cuda 12.4 installed:

./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-static-lib=no --with-shared-lib=yes --with-shlib-tools --with-components="cuda"
PAPI_CUDA_ROOT=/usr/local/cuda-12.4 make 

Eventually, the make runs following command which spews many errors seen in the the attached papi_cuda_make.log file papi_cuda_make.log:

gcc -fPIC -DPIC -shared -Wl,-soname -Wl,libpapi.so.7.1 -Xlinker "-rpath" -Xlinker "/usr/lib64" -DPAPI_NO_MEMORY_MANAGEMENT -DSTATIC_PAPI_EVENTS_TABLE  -DUSE_PERFEVENT_RDPMC=1 -DPEINCLUDE=\"libpfm4/include/perfmon/perf_event.h\" -D_REENTRANT -D_GNU_SOURCE -DUSE_COMPILER_TLS -Ilibpfm4/include -fvisibility=hidden -I. -DPAPI_NUM_COMP=4 -I/usr/local/cuda-12.4/include -I/usr/local/cuda-12.4/extras/CUPTI/include -g -DPAPI_CUDA_MAIN= -DPAPI_CUDA_RUNTIME= -DPAPI_CUDA_CUPTI= -DPAPI_CUDA_PERFWORKS= -I/usr/local/cuda-12.4/include -DOSLOCK=\"linux-lock.h\" -DOSCONTEXT=\"linux-context.h\" -O2 x86_cpuid_info.c papi_libpfm4_events.c papi.c papi_internal.c high-level/papi_hl.c extras.c sw_multiplex.c upper_PAPI_FWRAPPERS.c papi_fwrappers_.c papi_fwrappers__.c threads.c cpus.c linux-memory.c linux-timer.c linux-common.c  papi_preset.c papi_vector.c papi_memory.c components/perf_event/perf_event.c components/perf_event/pe_libpfm4_events.c components/perf_event_uncore/perf_event_uncore.c components/cuda/linux-cuda.c components/cuda/cupti_dispatch.c components/cuda/cupti_utils.c components/cuda/cupti_common.c components/cuda/cupti_profiler.c components/cuda/cupti_events.c  components/sysdetect/sysdetect.c components/sysdetect/nvidia_gpu.c components/sysdetect/amd_gpu.c components/sysdetect/cpu.c components/sysdetect/cpu_utils.c components/sysdetect/os_cpu_utils.c components/sysdetect/linux_cpu_utils.c components/sysdetect/x86_cpu_utils.c  -o libpapi.so.7.1.0.0 -Bdynamic -Llibpfm4/lib -lpfm -Wl,-rpath=/home/wcohen/papi-7.1.0/src/libpfm4/lib:/home/wcohen/papi-7.1.0/src,--enable-new-dtags 

Note: if try to do the same with cuda 12.3 with the command below there are no errors for the cuda component:

PAPI_CUDA_ROOT=/usr/local/cuda-12.3 make

Treece-Burgess commented 1 month ago

I have been able to reproduce this issue on a machine with OS Rocky Linux 9.3 (Blue Onyx) for Cuda versions 12.4.1 and 12.5.

Command used when building PAPI with Cuda 12.4.1 and 12.5: ./configure --prefix=$PWD/test-install --with-components="cuda"

Similarly to Will, I was unable to recreate this issue with a Cuda version < 12.4. Therefore, it seems to be isolated to Cuda versions >= 12.4.

Treece-Burgess commented 3 weeks ago

For the implicit declaration errors, I have been able to resolve these when building PAPI with Cuda 12.4.1. In a few cases, I explicitly added the header file that were needed:

In a few cases, I simply re-ordered the includes to be placed before cupti_config.h and this resolved the issue of implicit declarations. However, this could point to an underlying issue with cupti_config.h that may need to be looked into in the future.

Treece-Burgess commented 3 weeks ago

When building with Cuda 12.4.1, an error occurs that is from the file titled cupti_activity.h. See start of error below:

/apps/spacks/temp-cuda-12/opt/spack/linux-rocky9-x86_64/gcc-11.4.1/cuda-12.4.1-2g7zgmnqhrvsx2gxniyzvrrvg34kzvwi/extras/CUPTI/include/cupti_activity.h:1829:23: error: expected ‘;’ before ‘typedef’
  19  1829 | START_PACKED_ALIGNMENT
  20       |                       ^
  21       |                       ;
  22 ......
  23  1837 | typedef struct PACKED_ALIGNMENT {
  24       | ~~~~~~~
  25 /apps/spacks/temp-cuda-12/opt/spack/linux-rocky9-x86_64/gcc-11.4.1/cuda-12.4.1-2g7zgmnqhrvsx2gxniyzvrrvg34kzvwi/extras/CUPTI/include/cupti_activity.h:1867:16: error: redefinition of ‘struct PACKED_ALIGNMENT’
  26  1867 | typedef struct PACKED_ALIGNMENT {
  27       |                ^~~~~~~~~~~~~~~~

The file cupti_activity.h is also present in Cuda 12.1.1; however, in Cuda 12.4 the old CUpti_ActivityContext has been deprecated and replaced with CUpti_ActivityContext2. Still looking into this error further.

Treece-Burgess commented 1 week ago

Cuda 12.4, refactors the cupti_activity.h header file which removes the below code block and places it within the file cupti_common.h:

#define ACTIVITY_RECORD_ALIGNMENT 8
#if defined(_WIN32) // Windows 32- and 64-bit
#define START_PACKED_ALIGNMENT __pragma(pack(push,1)) // exact fit - no padding
#define PACKED_ALIGNMENT __declspec(align(ACTIVITY_RECORD_ALIGNMENT))
#define END_PACKED_ALIGNMENT __pragma(pack(pop))
#elif defined(__GNUC__) // GCC
#define START_PACKED_ALIGNMENT
#define PACKED_ALIGNMENT __attribute__ ((__packed__)) __attribute__ ((aligned (ACTIVITY_RECORD_ALIGNMENT)))
#define END_PACKED_ALIGNMENT
#else // all other compilers
#define START_PACKED_ALIGNMENT
#define PACKED_ALIGNMENT
#define END_PACKED_ALIGNMENT
#endif

This causes an issue in PAPI, due to the component/cuda/ directory having a file titled cupti_common.h and with the same header guards as used by NVIDIA. Renaming cupti_common.c/ cupti_common.hto cupti_common_papi.c/cupti_common_papi.h as well as updating the header guards fixes this issue.

Tested with PAPI builds using Cuda 12.1.1 and Cuda 12.4.1.

@jagode @adanalis Do either of you have a suggestion on the new file name to use? I used cupti_common_papi during my testing, but do not know if that is preferred or not.

jagode commented 1 week ago

Perhaps we can go with papi_cuda_common.h or papi_cupti_common.h.