Issue with GPU build using latest NVHPC compilers

jchen6727 commented 1 year ago

Context

cmake build with GPU fails (Debian 11, cuda 11.8, nvhpc&mpi 23.7, cmake 3.18.4) due to incompatible redefinition. This is a fresh build starting from a Google VM (Google, Deep Learning VM with CUDA 11.8, M111, Debian 11, Python 3.10. With CUDA 11.8 preinstalled) + nvhpc + python dependencies.

Overview of the issue

Debian 11 w/ cuda 11.8, nvhpc 23.7, cmake 3.18.4 fails.

cmake ..   -DNRN_ENABLE_CORENEURON=ON   -DCORENRN_ENABLE_GPU=ON   -DNRN_ENABLE_INTERVIEWS=OFF   -DNRN_ENABLE_RX3D=OFF   -DCMAKE_INSTALL_PREFIX=$HOME/install   -DCMAKE_C_COMPILER=nvc   -DCMAKE_CUDA_COMPILER=nvcc   -DCMAKE_CXX_COMPILER=nvc++

succeeds

however, when running

cmake --build . --parallel

"/usr/include/stdint.h", line 127: error: incompatible redefinition of macro "UINT8_MAX" (declared at line 78 of "src/nrniv/modlunit_generated/lex.cpp")
  # define UINT8_MAX            (255)
           ^

"/usr/include/stdint.h", line 128: error: incompatible redefinition of macro "UINT16_MAX" (declared at line 81 of "src/nrniv/modlunit_generated/lex.cpp")
  # define UINT16_MAX           (65535)
           ^

"/usr/include/stdint.h", line 227: error: incompatible redefinition of macro "SIZE_MAX" (declared at line 88 of "src/nrniv/modlunit_generated/lex.cpp")
  #  define SIZE_MAX            (18446744073709551615UL)

Expected result/behavior

generation of binaries

NEURON setup

Version: 8.2.3
Installation method: cmake w/ GPU
OS + Version: Debian 11
Compiler + Version: nvcc 11.8, nvc 23.7, nvcc 23.7

Minimal working example - MWE

from Debian 11 w/ nvcc 11.8, nvc 23.7, nvcc 23.7

from the build per tutorial steps>

cmake ..   -DNRN_ENABLE_CORENEURON=ON   -DCORENRN_ENABLE_GPU=ON   -DNRN_ENABLE_INTERVIEWS=OFF   -DNRN_ENABLE_RX3D=OFF   -DCMAKE_INSTALL_PREFIX=$HOME/install   -DCMAKE_C_COMPILER=nvc   -DCMAKE_CUDA_COMPILER=nvcc   -DCMAKE_CXX_COMPILER=nvc++

then fails on>

cmake --build . --parallel

Logs

"/usr/include/stdint.h", line 127: error: incompatible redefinition of macro "UINT8_MAX" (declared at line 78 of "src/nrniv/nocmodl_generated/lex.cpp")
  # define UINT8_MAX            (255)
           ^

"/usr/include/stdint.h", line 128: error: incompatible redefinition of macro "UINT16_MAX" (declared at line 81 of "src/nrniv/nocmodl_generated/lex.cpp")
  # define UINT16_MAX           (65535)
           ^

"/usr/include/stdint.h", line 227: error: incompatible redefinition of macro "SIZE_MAX" (declared at line 88 of "src/nrniv/nocmodl_generated/lex.cpp")
  #  define SIZE_MAX            (18446744073709551615UL)
            ^

pramodk commented 1 year ago

@jchen6727: could you provide the output of the cmake .. <arguments> command? i.e. I want to know which version of Flex & Bison your system has.

In my example, if I look at "src/nrniv/nocmodl_generated/lex.cpp" then I see:

 #ifndef UINT16_MAX
 #define UINT16_MAX             (65535U)
 #endif

i.e. redefinitions are properly protected. I wonder how is your lex.cpp files look like. Could you attach that as well?

jchen6727 commented 1 year ago

Thanks,

cmake logs the generated lex

(base) jchen@nvidia-hpc-sdk:~$ bison --version
bison (GNU Bison) 3.7.5
Written by Robert Corbett and Richard Stallman.

Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(base) jchen@nvidia-hpc-sdk:~$ flex --version
flex 2.6.4

snippet from my lex

#ifndef UINT16_MAX
#define UINT16_MAX             (65535U)
#endif

re: proper protection of redefinition, if I place the statement:

#include <stdint.h>

before the blocks that generate the errors the build progresses past this step

jchen6727 commented 1 year ago

Hello @pramodk,

sorry wondering if any update or my message from before might have gotten lost.

I might find access to another environment with CoreNEURON + GPU later this week (to work backward from) but would be nice to see what is preventing installation on this Google cloud instance.

pramodk commented 1 year ago

@jchen6727 / @salvadord : any chance that we will be able to login to such an instance and debug this / look into ourselves? That would be just so much easier than trying to set up & reproduce locally.

jchen6727 commented 1 year ago

@jchen6727 / @salvadord : any chance that we will be able to login to such an instance and debug this / look into ourselves? That would be just so much easier than trying to set up & reproduce locally.

@pramodk if there is a gmail account I think can grant access to the VM? Is there an email to send the invite to?

salvadord commented 1 year ago

yes, in the past we have been able to give access to pramod and others from the BBP team. james, let me know if you have any issues giving them access.

pramodk commented 1 year ago

I was able to reproduce the issue with Docker and Debian11. It's not related to CoreNEURON or GPU build but combination of NVHPC and CMake version (3.18) exists on Debian.

@jchen6727: Could you do git pull and checkout the branch pramodk/nvhpc-build-fix and try building again?

For future reference (for us as a developer), to reproduce this locally:

# start debian 11 container
 docker run -it debian:11  bash

# install packages and build NEURON with NVHPC (not necessary to have coreneuron or GPU build)

 apt-get update
 apt install curl gpg
 curl https://developer.download.nvidia.com/hpc-sdk/ubuntu/DEB-GPG-KEY-NVIDIA-HPC-SDK | gpg --dearmor -o /usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg
 echo 'deb [signed-by=/usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg] https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 /' | tee /etc/apt/sources.list.d/nvhpc.list

 apt-get update -y
 apt-get install -y nvhpc-23-9
 apt-get install -y bison cmake flex git libncurses-dev libmpich-dev libssl-dev   libx11-dev libxcomposite-dev ninja-build mpich libreadline-dev sudo wget unzip
 apt install libpython3.9-dev python3.9-distutils python3-pip
 pip3 install jinja2 sympy pytest pyyaml

 apt install environment-modules

 git clone https://github.com/neuronsimulator/nrn.git --recursive
 mkdir nrn/build
 cd nrn/build

 source /etc/profile.d/modules.sh
 module use /opt/nvidia/hpc_sdk/modulefiles
 module load nvhpc/23.9

 cmake ..   -DNRN_ENABLE_CORENEURON=OFF   -DCORENRN_ENABLE_GPU=OFF   -DNRN_ENABLE_INTERVIEWS=OFF   -DNRN_ENABLE_RX3D=OFF   -DCMAKE_INSTALL_PREFIX=$HOME/install
 make -j

should fail with the errors:

root@890a90a9731b:~/nrn/build_cpu_nvhpc/src/nrniv# /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/nvc++ -DCOMPILE_DEFINITIONS -DCVODE=1 -DHAVE_CONFIG_H -DMPICH_SKIP_MPICXX=1 -DMPI_NO_CPPBIND=1 -DNMODL=1 -DNRN_ENABLE_THREADS -DOMPI_SKIP_MPICXX=1 -DR123_USE_INTRIN_H=0 -DUSE_PYTHON -I/root/nrn/build_cpu_nvhpc/src/nrniv/nocmodl_generated -I/root/nrn/src/nmodl -I/root/nrn/external/CLI11/include -I/root/nrn/src/ivoc -I/root/nrn/src/nrncvode -I/root/nrn/src/nrniv -I/root/nrn/src/nrnoc -I/root/nrn/src/oc -I/root/nrn/build_cpu_nvhpc -I/root/nrn/build_cpu_nvhpc/src/nrncvode -I/root/nrn/build_cpu_nvhpc/src/nrnoc -I/root/nrn/build_cpu_nvhpc/src/nrnpython -I/root/nrn/build_cpu_nvhpc/src/oc -I/root/nrn/build_cpu_nvhpc/src/parallel -I/root/nrn/build_cpu_nvhpc/src/sundials -I/root/nrn/build_cpu_nvhpc/src/sundials/shared -I/root/nrn/external/Random123/include -I/root/nrn/src -I/root/nrn/src/gnu -I/root/nrn/src/mesch -I/root/nrn/src/nrnmpi -I/root/nrn/src/nrnpython -I/root/nrn/src/parallel -I/root/nrn/src/scopmath -I/root/nrn/src/sparse -I/root/nrn/src/sparse13 -I/root/nrn/src/sundials -I/root/nrn/src/sundials/cvodes -I/root/nrn/src/sundials/ida -I/root/nrn/src/sundials/shared -g  -O2  --diag_suppress=1,47,111,128,170,174,177,180,186,301,541,550,816,2465 -noswitcherror -O0 --c++17  -o CMakeFiles/nocmodl.dir/nocmodl_generated/lex.cpp.o -c /root/nrn/build_cpu_nvhpc/src/nrniv/nocmodl_generated/lex.cpp -A
nvc++-Info-Switch -Mvect forces -O2
"/usr/include/stdint.h", line 127: error: incompatible redefinition of macro "UINT8_MAX" (declared at line 78 of "src/nrniv/nocmodl_generated/lex.cpp")
  # define UINT8_MAX            (255)
           ^

"/usr/include/stdint.h", line 128: error: incompatible redefinition of macro "UINT16_MAX" (declared at line 81 of "src/nrniv/nocmodl_generated/lex.cpp")
  # define UINT16_MAX           (65535)
           ^

"/usr/include/stdint.h", line 227: error: incompatible redefinition of macro "SIZE_MAX" (declared at line 88 of "src/nrniv/nocmodl_generated/lex.cpp")
  #  define SIZE_MAX            (18446744073709551615UL)
            ^

jchen6727 commented 1 year ago

Hi @pramodk checkout of the branch resolves the incompatible redefinition issue: however, the build fails at a (much) later point-> cmake --build . --parallel 8

NVC++-S-0053-Illegal use of void type (/home/jchen/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.7-0: compilation completed with severe errors
gmake[2]: *** [src/coreneuron/CMakeFiles/coreneuron-core.dir/build.make:849: src/coreneuron/CMakeFiles/coreneuron-core.dir/utils/randoms/nrnran123.cpp.o] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:3506: src/coreneuron/CMakeFiles/coreneuron-core.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....

I do have the ability to grant access to the compute instance if given email address.

Many thanks,

James

pramodk commented 1 year ago

@jchen6727 : Ok. I sent you DM via neuron-dev Slack.

pramodk commented 1 year ago

@jchen6727 : Just FYI, we looked into this on Friday. It's a bit depressing to see that the NVHPC/Nvidia's compiler has bugs and producing internal compiler errors:

With the latest 23.9

[ 16%] Building CXX object src/coreneuron/CMakeFiles/coreneuron-core.dir/utils/randoms/nrnran123.cpp.o
...
NVC++-F-0000-Internal compiler error. size of unknown type       0  (/root/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.9-0: compilation aborted

and the previous 23.7

NVC++-S-0053-Illegal use of void type (/home/jchen/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.7-0: compilation completed with severe errors

So this will need a bit more time to see how work around this compiler bug (while some other work is ongoing).

@iomaganaris : do you have the script that you used to build NEURON+CoreNEURON on gcloud for NEURON paper?

pramodk commented 12 months ago

I updated the title of this ticket. #2591 only fixed part of the issue that has been detailed in this ticket. So I will reopen it with the updated title.

iomaganaris commented 12 months ago

@iomaganaris : do you have the script that you used to build NEURON+CoreNEURON on gcloud for NEURON paper?

@pramodk : I have a script but we used spack to install all the modules for the NEURON paper on google cloud and we used NVHPC 21.2 back then. Here is the script: https://github.com/neuronsimulator/neuron_frontiers_2022_artifacts/blob/main/install_modules.sh

pramodk commented 9 months ago

@jchen6727: it took more time to revisit this than I would have liked :(. You can see the single line change in https://github.com/neuronsimulator/nrn/pull/2680/files. If you are using a specific version or branch then you can make this change directly. Otherwise, you can checkout the ~PR~ master branch.

With the above, I expect it to be built without issues.

neuronsimulator / nrn