Closed jchen6727 closed 9 months ago
@jchen6727: could you provide the output of the cmake .. <arguments>
command? i.e. I want to know which version of Flex & Bison your system has.
In my example, if I look at "src/nrniv/nocmodl_generated/lex.cpp"
then I see:
#ifndef UINT16_MAX
#define UINT16_MAX (65535U)
#endif
i.e. redefinitions are properly protected. I wonder how is your lex.cpp
files look like. Could you attach that as well?
Thanks,
(base) jchen@nvidia-hpc-sdk:~$ bison --version
bison (GNU Bison) 3.7.5
Written by Robert Corbett and Richard Stallman.
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
(base) jchen@nvidia-hpc-sdk:~$ flex --version
flex 2.6.4
snippet from my lex
#ifndef UINT16_MAX
#define UINT16_MAX (65535U)
#endif
re: proper protection of redefinition, if I place the statement:
#include <stdint.h>
before the blocks that generate the errors the build progresses past this step
Hello @pramodk,
sorry wondering if any update or my message from before might have gotten lost.
I might find access to another environment with CoreNEURON + GPU later this week (to work backward from) but would be nice to see what is preventing installation on this Google cloud instance.
@jchen6727 / @salvadord : any chance that we will be able to login to such an instance and debug this / look into ourselves? That would be just so much easier than trying to set up & reproduce locally.
@jchen6727 / @salvadord : any chance that we will be able to login to such an instance and debug this / look into ourselves? That would be just so much easier than trying to set up & reproduce locally.
@pramodk if there is a gmail account I think can grant access to the VM? Is there an email to send the invite to?
yes, in the past we have been able to give access to pramod and others from the BBP team. james, let me know if you have any issues giving them access.
I was able to reproduce the issue with Docker and Debian11. It's not related to CoreNEURON or GPU build but combination of NVHPC and CMake version (3.18) exists on Debian.
@jchen6727: Could you do git pull and checkout the branch pramodk/nvhpc-build-fix
and try building again?
For future reference (for us as a developer), to reproduce this locally:
# start debian 11 container
docker run -it debian:11 bash
# install packages and build NEURON with NVHPC (not necessary to have coreneuron or GPU build)
apt-get update
apt install curl gpg
curl https://developer.download.nvidia.com/hpc-sdk/ubuntu/DEB-GPG-KEY-NVIDIA-HPC-SDK | gpg --dearmor -o /usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg
echo 'deb [signed-by=/usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg] https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 /' | tee /etc/apt/sources.list.d/nvhpc.list
apt-get update -y
apt-get install -y nvhpc-23-9
apt-get install -y bison cmake flex git libncurses-dev libmpich-dev libssl-dev libx11-dev libxcomposite-dev ninja-build mpich libreadline-dev sudo wget unzip
apt install libpython3.9-dev python3.9-distutils python3-pip
pip3 install jinja2 sympy pytest pyyaml
apt install environment-modules
git clone https://github.com/neuronsimulator/nrn.git --recursive
mkdir nrn/build
cd nrn/build
source /etc/profile.d/modules.sh
module use /opt/nvidia/hpc_sdk/modulefiles
module load nvhpc/23.9
cmake .. -DNRN_ENABLE_CORENEURON=OFF -DCORENRN_ENABLE_GPU=OFF -DNRN_ENABLE_INTERVIEWS=OFF -DNRN_ENABLE_RX3D=OFF -DCMAKE_INSTALL_PREFIX=$HOME/install
make -j
should fail with the errors:
root@890a90a9731b:~/nrn/build_cpu_nvhpc/src/nrniv# /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/bin/nvc++ -DCOMPILE_DEFINITIONS -DCVODE=1 -DHAVE_CONFIG_H -DMPICH_SKIP_MPICXX=1 -DMPI_NO_CPPBIND=1 -DNMODL=1 -DNRN_ENABLE_THREADS -DOMPI_SKIP_MPICXX=1 -DR123_USE_INTRIN_H=0 -DUSE_PYTHON -I/root/nrn/build_cpu_nvhpc/src/nrniv/nocmodl_generated -I/root/nrn/src/nmodl -I/root/nrn/external/CLI11/include -I/root/nrn/src/ivoc -I/root/nrn/src/nrncvode -I/root/nrn/src/nrniv -I/root/nrn/src/nrnoc -I/root/nrn/src/oc -I/root/nrn/build_cpu_nvhpc -I/root/nrn/build_cpu_nvhpc/src/nrncvode -I/root/nrn/build_cpu_nvhpc/src/nrnoc -I/root/nrn/build_cpu_nvhpc/src/nrnpython -I/root/nrn/build_cpu_nvhpc/src/oc -I/root/nrn/build_cpu_nvhpc/src/parallel -I/root/nrn/build_cpu_nvhpc/src/sundials -I/root/nrn/build_cpu_nvhpc/src/sundials/shared -I/root/nrn/external/Random123/include -I/root/nrn/src -I/root/nrn/src/gnu -I/root/nrn/src/mesch -I/root/nrn/src/nrnmpi -I/root/nrn/src/nrnpython -I/root/nrn/src/parallel -I/root/nrn/src/scopmath -I/root/nrn/src/sparse -I/root/nrn/src/sparse13 -I/root/nrn/src/sundials -I/root/nrn/src/sundials/cvodes -I/root/nrn/src/sundials/ida -I/root/nrn/src/sundials/shared -g -O2 --diag_suppress=1,47,111,128,170,174,177,180,186,301,541,550,816,2465 -noswitcherror -O0 --c++17 -o CMakeFiles/nocmodl.dir/nocmodl_generated/lex.cpp.o -c /root/nrn/build_cpu_nvhpc/src/nrniv/nocmodl_generated/lex.cpp -A
nvc++-Info-Switch -Mvect forces -O2
"/usr/include/stdint.h", line 127: error: incompatible redefinition of macro "UINT8_MAX" (declared at line 78 of "src/nrniv/nocmodl_generated/lex.cpp")
# define UINT8_MAX (255)
^
"/usr/include/stdint.h", line 128: error: incompatible redefinition of macro "UINT16_MAX" (declared at line 81 of "src/nrniv/nocmodl_generated/lex.cpp")
# define UINT16_MAX (65535)
^
"/usr/include/stdint.h", line 227: error: incompatible redefinition of macro "SIZE_MAX" (declared at line 88 of "src/nrniv/nocmodl_generated/lex.cpp")
# define SIZE_MAX (18446744073709551615UL)
^
Hi @pramodk
checkout of the branch resolves the incompatible redefinition issue: however, the build fails at a (much) later point->
cmake --build . --parallel 8
NVC++-S-0053-Illegal use of void type (/home/jchen/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.7-0: compilation completed with severe errors
gmake[2]: *** [src/coreneuron/CMakeFiles/coreneuron-core.dir/build.make:849: src/coreneuron/CMakeFiles/coreneuron-core.dir/utils/randoms/nrnran123.cpp.o] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:3506: src/coreneuron/CMakeFiles/coreneuron-core.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
I do have the ability to grant access to the compute instance if given email address.
Many thanks,
James
@jchen6727 : Ok. I sent you DM via neuron-dev Slack.
@jchen6727 : Just FYI, we looked into this on Friday. It's a bit depressing to see that the NVHPC/Nvidia's compiler has bugs and producing internal compiler errors:
[ 16%] Building CXX object src/coreneuron/CMakeFiles/coreneuron-core.dir/utils/randoms/nrnran123.cpp.o
...
NVC++-F-0000-Internal compiler error. size of unknown type 0 (/root/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.9-0: compilation aborted
NVC++-S-0053-Illegal use of void type (/home/jchen/nrn/src/coreneuron/utils/randoms/nrnran123.cpp)
NVC++/x86-64 Linux 23.7-0: compilation completed with severe errors
So this will need a bit more time to see how work around this compiler bug (while some other work is ongoing).
@iomaganaris : do you have the script that you used to build NEURON+CoreNEURON on gcloud for NEURON paper?
I updated the title of this ticket. #2591 only fixed part of the issue that has been detailed in this ticket. So I will reopen it with the updated title.
@iomaganaris : do you have the script that you used to build NEURON+CoreNEURON on gcloud for NEURON paper?
@pramodk : I have a script but we used spack
to install all the modules for the NEURON paper on google cloud and we used NVHPC 21.2 back then.
Here is the script: https://github.com/neuronsimulator/neuron_frontiers_2022_artifacts/blob/main/install_modules.sh
@jchen6727: it took more time to revisit this than I would have liked :(. You can see the single line change in https://github.com/neuronsimulator/nrn/pull/2680/files. If you are using a specific version or branch then you can make this change directly. Otherwise, you can checkout the ~PR~ master branch.
With the above, I expect it to be built without issues.
Context
cmake build with GPU fails (Debian 11, cuda 11.8, nvhpc&mpi 23.7, cmake 3.18.4) due to incompatible redefinition. This is a fresh build starting from a Google VM (Google, Deep Learning VM with CUDA 11.8, M111, Debian 11, Python 3.10. With CUDA 11.8 preinstalled) + nvhpc + python dependencies.
Overview of the issue
Debian 11 w/ cuda 11.8, nvhpc 23.7, cmake 3.18.4 fails.
succeeds
however, when running
Expected result/behavior
generation of binaries
NEURON setup
Minimal working example - MWE
from Debian 11 w/ nvcc 11.8, nvc 23.7, nvcc 23.7
from the build per tutorial steps>
then fails on>
Logs