marian-nmt / marian

Fast Neural Machine Translation in C++
https://marian-nmt.github.io
Other
1.22k stars 228 forks source link

Multi marian-decoders crash albeit enough graphical memory #319

Closed TingxunShi closed 4 years ago

TingxunShi commented 4 years ago

I started a marian-decoder on my GPU and found there was enough resources left (GPU usage 74%, graphical memory used 5G, 27G left)

So I started another decoder on the same card but got crashed, error message prompted "Curand error 203"

[2020-03-12 11:28:43] Error: Curand error 203 - marian/src/tensors/rand.cpp:75: 
curandCreateGenerator(&generator_, CURAND_RNG_PSEUDO_DEFAULT)
[2020-03-12 11:28:43] Error: Aborted from marian::CurandRandomGenerator::CurandRandomGenerator(size_t, marian::DeviceId) in marian/src/tensors/rand.cpp:75

I've checked the issue list and found some similar questions, solution given like updating driver, etc. But the first task I started is still running without any problem, hence I wonder the reason, hope you could please help to check and advise

emjotde commented 4 years ago

Can you provide both commands you used to start the decoder instances? Is it repeatable?

TingxunShi commented 4 years ago

Thanks for the quick response. The command is

../build/marian-decoder -c model.npz.decoder.yml -i input  -d 1

Yes it is repeatable

TingxunShi commented 4 years ago

Driver and CUDA info: NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1

emjotde commented 4 years ago

OK, then let's try a new Marian feature :) Can you give me the output of

../build/marian-decoder --build-info all
emjotde commented 4 years ago

Hm. I am not able to reproduce this for now. I started multiple processes on the same GPU and they work just fine.

emjotde commented 4 years ago

What kind of GPU is that?

TingxunShi commented 4 years ago
Error: The following arguments were not expected: all --build-info

Marian version: v1.7.6 9fd5ba9 2019-11-27 19:28:16 -0800 GPU version: Tesla V100-PCIE32G

emjotde commented 4 years ago

Oh, you are using a pretty old version, can you update to the current code?

TingxunShi commented 4 years ago

OK let me try and update the issue, thanks for the advice

emjotde commented 4 years ago

And when you do, it might be a good idea to build from scratch, i.e. remove the build folder etc.

TingxunShi commented 4 years ago

I built v1.9 from scatch and tried, but still got the same error...

TingxunShi commented 4 years ago

The full output of marian-decoder --build-info all is listed below FYI

AVX2_FOUND=true
AVX512_FOUND=true
AVX_FOUND=true
BOOST_INCLUDEDIR=/lib/boost_1_68_0/include
BOOST_LIBRARYDIR=/lib/boost_1_68_0/lib
BOOST_ROOT=/lib/boost_1_68_0
BUILD_ARCH=native
CMAKE_AR=/usr/bin/ar
CMAKE_BUILD_TYPE=Release
CMAKE_COLOR_MAKEFILE=ON
CMAKE_CXX_COMPILER=/lib/gcc-5.4.0/bin/g++
CMAKE_CXX_COMPILER_AR=/lib/gcc-5.4.0/bin/gcc-ar
CMAKE_CXX_COMPILER_RANLIB=/lib/gcc-5.4.0/bin/gcc-ranlib
CMAKE_CXX_FLAGS=-std=c++11 -pthread -Wl,--no-as-needed -fPIC -Wno-unused-result -Wno-unknown-warning-option  -march=native  -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -DCUDNN -DCUDA_FOUND -DUSE_NCCL -DMKL_ILP64 -m64
CMAKE_CXX_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_CXX_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE=-Ofast -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_CXX_FLAGS_RELWITHDEBINFO=-Ofast -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_C_COMPILER=/lib/gcc-5.4.0/bin/gcc
CMAKE_C_COMPILER_AR=/lib/gcc-5.4.0/bin/gcc-ar
CMAKE_C_COMPILER_RANLIB=/lib/gcc-5.4.0/bin/gcc-ranlib
CMAKE_C_FLAGS=-pthread -Wl,--no-as-needed -fPIC -Wno-unused-result -Wno-unknown-warning-option  -march=native  -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -DMKL_ILP64 -m64
CMAKE_C_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_C_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_C_FLAGS_RELEASE=-O3 -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_C_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_EXPORT_COMPILE_COMMANDS=OFF
CMAKE_INSTALL_PREFIX=/usr/local
CMAKE_LINKER=/usr/bin/ld
CMAKE_MAKE_PROGRAM=/usr/bin/gmake
CMAKE_NM=/usr/bin/nm
CMAKE_OBJCOPY=/usr/bin/objcopy
CMAKE_OBJDUMP=/usr/bin/objdump
CMAKE_RANLIB=/usr/bin/ranlib
CMAKE_SKIP_INSTALL_RPATH=NO
CMAKE_SKIP_RPATH=NO
CMAKE_STRIP=/usr/bin/strip
CMAKE_VERBOSE_MAKEFILE=FALSE
COMPILE_CPU=ON
COMPILE_CUDA=ON
COMPILE_CUDA_SM35=ON
COMPILE_CUDA_SM50=ON
COMPILE_CUDA_SM60=ON
COMPILE_CUDA_SM70=ON
COMPILE_EXAMPLES=OFF
COMPILE_SERVER=OFF
COMPILE_TESTS=ON
CUDA_64_BIT_DEVICE_CODE=ON
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE=ON
CUDA_BUILD_CUBIN=OFF
CUDA_BUILD_EMULATION=OFF
CUDA_CUDART_LIBRARY=/application/cuda_10.0/lib64/libcudart.so
CUDA_CUDA_LIBRARY=/usr/lib64/libcuda.so
CUDA_HOST_COMPILATION_CPP=ON
CUDA_HOST_COMPILER=/lib/gcc-5.4.0/bin/gcc
CUDA_NVCC_EXECUTABLE=/application/cuda_10.0/bin/nvcc
CUDA_NVCC_FLAGS=-DCUDNN;-DCUDA_FOUND;-DUSE_NCCL;--default-stream;per-thread;-O3;-g;--use_fast_math;-arch=sm_35;-gencode=arch=compute_35,code=sm_35;-gencode=arch=compute_50,code=sm_50;-gencode=arch=compute_52,code=sm_52;-gencode=arch=compute_60,code=sm_60;-gencode=arch=compute_61,code=sm_61;-gencode=arch=compute_70,code=sm_70;-gencode=arch=compute_70,code=compute_70;-ccbin;/lib/gcc-5.4.0/bin/gcc;-std=c++11;-Xcompiler -fPIC;-Xcompiler -Wno-unused-result;-Xcompiler -Wno-deprecated;-Xcompiler -Wno-pragmas;-Xcompiler -Wno-unused-value;-Xcompiler -Werror;-Xcompiler -msse2;-Xcompiler -msse3;-Xcompiler -msse4.1;-Xcompiler -msse4.2;-Xcompiler -mavx;-Xcompiler -mavx2;-Xcompiler -mavx512f
CUDA_PROPAGATE_HOST_FLAGS=OFF
CUDA_SDK_ROOT_DIR=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION=OFF
CUDA_TOOLKIT_INCLUDE=/application/cuda_10.0/include
CUDA_TOOLKIT_ROOT_DIR=/application/cuda_10.0
CUDA_USE_STATIC_CUDA_RUNTIME=ON
CUDA_VERBOSE_BUILD=OFF
CUDA_VERSION=10.0
CUDA_cublas_LIBRARY=/application/cuda_10.0/lib64/libcublas.so
CUDA_cudadevrt_LIBRARY=/application/cuda_10.0/lib64/libcudadevrt.a
CUDA_cudart_static_LIBRARY=/application/cuda_10.0/lib64/libcudart_static.a
CUDA_cufft_LIBRARY=/application/cuda_10.0/lib64/libcufft.so
CUDA_cupti_LIBRARY=/application/cuda_10.0/extras/CUPTI/lib64/libcupti.so
CUDA_curand_LIBRARY=/application/cuda_10.0/lib64/libcurand.so
CUDA_cusolver_LIBRARY=/application/cuda_10.0/lib64/libcusolver.so
CUDA_cusparse_LIBRARY=/application/cuda_10.0/lib64/libcusparse.so
CUDA_nppc_LIBRARY=/application/cuda_10.0/lib64/libnppc.so
CUDA_nppial_LIBRARY=/application/cuda_10.0/lib64/libnppial.so
CUDA_nppicc_LIBRARY=/application/cuda_10.0/lib64/libnppicc.so
CUDA_nppicom_LIBRARY=/application/cuda_10.0/lib64/libnppicom.so
CUDA_nppidei_LIBRARY=/application/cuda_10.0/lib64/libnppidei.so
CUDA_nppif_LIBRARY=/application/cuda_10.0/lib64/libnppif.so
CUDA_nppig_LIBRARY=/application/cuda_10.0/lib64/libnppig.so
CUDA_nppim_LIBRARY=/application/cuda_10.0/lib64/libnppim.so
CUDA_nppist_LIBRARY=/application/cuda_10.0/lib64/libnppist.so
CUDA_nppisu_LIBRARY=/application/cuda_10.0/lib64/libnppisu.so
CUDA_nppitc_LIBRARY=/application/cuda_10.0/lib64/libnppitc.so
CUDA_npps_LIBRARY=/application/cuda_10.0/lib64/libnpps.so
CUDA_rt_LIBRARY=/usr/lib64/librt.so
CUDNN_INCLUDE_DIR=/application/cuda_10.0/include
CUDNN_LIBRARY=/application/cuda_10.0/lib64/libcudnn.so
GIT_EXECUTABLE=/application/git/bin/git
INTEL_ROOT=/opt/intel
MKL_CORE_LIBRARY=/marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_core.a
MKL_INCLUDE_DIR=/marian-v1.9.0/build/intel/mkl/include
MKL_INCLUDE_DIRS=/marian-v1.9.0/build/intel/mkl/include
MKL_INTERFACE_LIBRARY=/marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_intel_ilp64.a
MKL_LIBRARIES=-Wl,--start-group;/marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_intel_ilp64.a;/marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_sequential.a;/marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_core.a;-Wl,--end-group
MKL_ROOT=/marian-v1.9.0/build/intel/mkl
MKL_SEQUENTIAL_LAYER_LIBRARY=/marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_sequential.a
PKG_CONFIG_EXECUTABLE=/usr/bin/pkg-config
SSE2_FOUND=true
SSE3_FOUND=true
SSE4_1_FOUND=true
SSE4_2_FOUND=true
SSSE3_FOUND=true
TCMALLOC_LIB=/lib/gperftools-2.7/lib
Tcmalloc_INCLUDE_DIR=/lib/gperftools-2.7/include
Tcmalloc_LIBRARY=/lib/gperftools-2.7/lib
USE_CCACHE=OFF
USE_CUDNN=ON
USE_DOXYGEN=ON
USE_FBGEMM=OFF
USE_MKL=ON
USE_MPI=OFF
USE_NCCL=ON
USE_SENTENCEPIECE=OFF
USE_STATIC_LIBS=OFF
emjotde commented 4 years ago

Interesting, you said above you are using CUDA 10.1, but you are clearly linking against CUDA 10.0. My hunch would be that might be the issue. Do you have multiple CUDA installations or artifacts of old installations on your machine? What's your CMake command?

TingxunShi commented 4 years ago

Yes there are some old CUDA artifacts. I recompiled the whole marian using CUDA 10.1 only, however the same error kept jumping out. My CMake command is

cmake .. \
    -DBOOST_ROOT=$CUSTOM_LIB_HOME/boost_1_68_0 \
    -DBOOST_INCLUDEDIR=$CUSTOM_LIB_HOME/boost_1_68_0/include \
    -DBOOST_LIBRARYDIR=$CUSTOM_LIB_HOME/boost_1_68_0/lib \
    -DCMAKE_CXX_COMPILER=$CUSTOM_LIB_HOME/gcc-5.4.0/bin/g++ \
    -DCMAKE_C_COMPILER=$CUSTOM_LIB_HOME/gcc-5.4.0/bin/gcc \
    -DTcmalloc_INCLUDE_DIR=$CUSTOM_LIB_HOME/gperftools-2.7/include \
    -DTcmalloc_LIBRARY=$CUSTOM_LIB_HOME/gperftools-2.7/lib \
    -DTCMALLOC_LIB=$CUSTOM_LIB_HOME/gperftools-2.7/lib \
    -Werror=suggest-override \
    -DCMAKE_BUILD_TYPE=Release \
    -DUSE_SENTENCEPIECE=OFF \
    -DCOMPILE_CPU=ON \
    -DUSE_CUDNN=ON \
    -DCOMPILE_TESTS=ON
emjotde commented 4 years ago

Is marian-decoder --build-info all now reporting CUDA 10.1?

TingxunShi commented 4 years ago

Yes it is. The whole output is attached below FYI

AVX2_FOUND=true
AVX512_FOUND=true
AVX_FOUND=true
BOOST_INCLUDEDIR=lib/boost_1_68_0/include
BOOST_LIBRARYDIR=lib/boost_1_68_0/lib
BOOST_ROOT=lib/boost_1_68_0
BUILD_ARCH=native
CMAKE_AR=/usr/bin/ar
CMAKE_BUILD_TYPE=Release
CMAKE_COLOR_MAKEFILE=ON
CMAKE_CXX_COMPILER=lib/gcc-5.4.0/bin/g++
CMAKE_CXX_COMPILER_AR=lib/gcc-5.4.0/bin/gcc-ar
CMAKE_CXX_COMPILER_RANLIB=lib/gcc-5.4.0/bin/gcc-ranlib
CMAKE_CXX_FLAGS=-std=c++11 -pthread -Wl,--no-as-needed -fPIC -Wno-unused-result -Wno-unknown-warning-option  -march=native  -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -DCUDNN -DCUDA_FOUND -DUSE_NCCL -DMKL_ILP64 -m64
CMAKE_CXX_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_CXX_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_CXX_FLAGS_RELEASE=-Ofast -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_CXX_FLAGS_RELWITHDEBINFO=-Ofast -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_C_COMPILER=lib/gcc-5.4.0/bin/gcc
CMAKE_C_COMPILER_AR=lib/gcc-5.4.0/bin/gcc-ar
CMAKE_C_COMPILER_RANLIB=lib/gcc-5.4.0/bin/gcc-ranlib
CMAKE_C_FLAGS=-pthread -Wl,--no-as-needed -fPIC -Wno-unused-result -Wno-unknown-warning-option  -march=native  -msse2 -msse3 -msse4.1 -msse4.2 -mavx -mavx2 -mavx512f -DMKL_ILP64 -m64
CMAKE_C_FLAGS_DEBUG=-O0 -g -rdynamic
CMAKE_C_FLAGS_MINSIZEREL=-Os -DNDEBUG
CMAKE_C_FLAGS_RELEASE=-O3 -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_C_FLAGS_RELWITHDEBINFO=-O3 -m64 -funroll-loops -ffinite-math-only -g -rdynamic
CMAKE_EXPORT_COMPILE_COMMANDS=OFF
CMAKE_INSTALL_PREFIX=/usr/local
CMAKE_LINKER=/usr/bin/ld
CMAKE_MAKE_PROGRAM=/usr/bin/gmake
CMAKE_NM=/usr/bin/nm
CMAKE_OBJCOPY=/usr/bin/objcopy
CMAKE_OBJDUMP=/usr/bin/objdump
CMAKE_RANLIB=/usr/bin/ranlib
CMAKE_SKIP_INSTALL_RPATH=NO
CMAKE_SKIP_RPATH=NO
CMAKE_STRIP=/usr/bin/strip
CMAKE_VERBOSE_MAKEFILE=FALSE
COMPILE_CPU=ON
COMPILE_CUDA=ON
COMPILE_CUDA_SM35=ON
COMPILE_CUDA_SM50=ON
COMPILE_CUDA_SM60=ON
COMPILE_CUDA_SM70=ON
COMPILE_EXAMPLES=OFF
COMPILE_SERVER=OFF
COMPILE_TESTS=ON
CUDA_64_BIT_DEVICE_CODE=ON
CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE=ON
CUDA_BUILD_CUBIN=OFF
CUDA_BUILD_EMULATION=OFF
CUDA_CUDART_LIBRARY=cuda-10.1/lib64/libcudart.so
CUDA_CUDA_LIBRARY=/usr/lib64/libcuda.so
CUDA_HOST_COMPILATION_CPP=ON
CUDA_HOST_COMPILER=lib/gcc-5.4.0/bin/gcc
CUDA_NVCC_EXECUTABLE=cuda-10.1/bin/nvcc
CUDA_NVCC_FLAGS=-DCUDNN;-DCUDA_FOUND;-DUSE_NCCL;--default-stream;per-thread;-O3;-g;--use_fast_math;-arch=sm_35;-gencode=arch=compute_35,code=sm_35;-gencode=arch=compute_50,code=sm_50;-gencode=arch=compute_52,code=sm_52;-gencode=arch=compute_60,code=sm_60;-gencode=arch=compute_61,code=sm_61;-gencode=arch=compute_70,code=sm_70;-gencode=arch=compute_70,code=compute_70;-ccbin;lib/gcc-5.4.0/bin/gcc;-std=c++11;-Xcompiler -fPIC;-Xcompiler -Wno-unused-result;-Xcompiler -Wno-deprecated;-Xcompiler -Wno-pragmas;-Xcompiler -Wno-unused-value;-Xcompiler -Werror;-Xcompiler -msse2;-Xcompiler -msse3;-Xcompiler -msse4.1;-Xcompiler -msse4.2;-Xcompiler -mavx;-Xcompiler -mavx2;-Xcompiler -mavx512f
CUDA_PROPAGATE_HOST_FLAGS=OFF
CUDA_SDK_ROOT_DIR=CUDA_SDK_ROOT_DIR-NOTFOUND
CUDA_SEPARABLE_COMPILATION=OFF
CUDA_TOOLKIT_INCLUDE=cuda-10.1/include
CUDA_TOOLKIT_ROOT_DIR=cuda-10.1
CUDA_USE_STATIC_CUDA_RUNTIME=ON
CUDA_VERBOSE_BUILD=OFF
CUDA_VERSION=10.1
CUDA_cublas_LIBRARY=cuda-10.1/lib64/libcublas.so
CUDA_cudadevrt_LIBRARY=cuda-10.1/lib64/libcudadevrt.a
CUDA_cudart_static_LIBRARY=cuda-10.1/lib64/libcudart_static.a
CUDA_cufft_LIBRARY=cuda-10.1/lib64/libcufft.so
CUDA_cupti_LIBRARY=cuda-10.1/extras/CUPTI/lib64/libcupti.so
CUDA_curand_LIBRARY=cuda-10.1/lib64/libcurand.so
CUDA_cusolver_LIBRARY=cuda-10.1/lib64/libcusolver.so
CUDA_cusparse_LIBRARY=cuda-10.1/lib64/libcusparse.so
CUDA_nppc_LIBRARY=cuda-10.1/lib64/libnppc.so
CUDA_nppial_LIBRARY=cuda-10.1/lib64/libnppial.so
CUDA_nppicc_LIBRARY=cuda-10.1/lib64/libnppicc.so
CUDA_nppicom_LIBRARY=cuda-10.1/lib64/libnppicom.so
CUDA_nppidei_LIBRARY=cuda-10.1/lib64/libnppidei.so
CUDA_nppif_LIBRARY=cuda-10.1/lib64/libnppif.so
CUDA_nppig_LIBRARY=cuda-10.1/lib64/libnppig.so
CUDA_nppim_LIBRARY=cuda-10.1/lib64/libnppim.so
CUDA_nppist_LIBRARY=cuda-10.1/lib64/libnppist.so
CUDA_nppisu_LIBRARY=cuda-10.1/lib64/libnppisu.so
CUDA_nppitc_LIBRARY=cuda-10.1/lib64/libnppitc.so
CUDA_npps_LIBRARY=cuda-10.1/lib64/libnpps.so
CUDA_rt_LIBRARY=/usr/lib64/librt.so
CUDNN_INCLUDE_DIR=cuda-10.1/include
CUDNN_LIBRARY=cuda-10.1/lib64/libcudnn.so
GIT_EXECUTABLE=application/git/bin/git
INTEL_ROOT=/opt/intel
MKL_CORE_LIBRARY=marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_core.a
MKL_INCLUDE_DIR=marian-v1.9.0/build/intel/mkl/include
MKL_INCLUDE_DIRS=marian-v1.9.0/build/intel/mkl/include
MKL_INTERFACE_LIBRARY=marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_intel_ilp64.a
MKL_LIBRARIES=-Wl,--start-group;marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_intel_ilp64.a;marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_sequential.a;marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_core.a;-Wl,--end-group
MKL_ROOT=marian-v1.9.0/build/intel/mkl
MKL_SEQUENTIAL_LAYER_LIBRARY=marian-v1.9.0/build/intel/mkl/lib/intel64/libmkl_sequential.a
PKG_CONFIG_EXECUTABLE=/usr/bin/pkg-config
SSE2_FOUND=true
SSE3_FOUND=true
SSE4_1_FOUND=true
SSE4_2_FOUND=true
SSSE3_FOUND=true
TCMALLOC_LIB=lib/gperftools-2.7/lib
Tcmalloc_INCLUDE_DIR=lib/gperftools-2.7/include
Tcmalloc_LIBRARY=lib/gperftools-2.7/lib
USE_CCACHE=OFF
USE_CUDNN=ON
USE_DOXYGEN=ON
USE_FBGEMM=OFF
USE_MKL=ON
USE_MPI=OFF
USE_NCCL=ON
USE_SENTENCEPIECE=OFF
USE_STATIC_LIBS=OFF
emjotde commented 4 years ago

I am running out of ideas. What's the output of ldd marian-decoder ?

TingxunShi commented 4 years ago

It is

        linux-vdso.so.1 =>  (0x00007fffe1be6000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f599ceaa000)
        libcurand.so.10 => cuda-10.1/lib64/libcurand.so.10 (0x00007f5998e48000)
        libcusparse.so.10 => cuda-10.1/lib64/libcusparse.so.10 (0x00007f59917d7000)
        libcublas.so.10 => cuda-10.1/lib64/libcublas.so.10 (0x00007f598da5f000)
        libcudnn.so.7 => cuda-10.1/lib64/libcudnn.so.7 (0x00007f59738b8000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f597369c000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f5973494000)
        libstdc++.so.6 => lib64/libstdc++.so.6 (0x00007f59730ba000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f5972db8000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f5972ba0000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f59727dc000)
        /lib64/ld-linux-x86-64.so.2 (0x000055a03ca2f000)
        libcublasLt.so.10 => cuda-10.1/lib64/libcublasLt.so.10 (0x00007f597068b000)

However I should say that due to some other concerns the output above masks some directory prefix, e.g. libstdc++.so.6 and libcublas.so.10 come from different parent directory and both of them don't locate in the system default directory

I'm considering apply docker and build marian with a clean CUDA and gcc and with root privilege, could you please keep this issue open? I'll keep updating on it.

emjotde commented 4 years ago

Sure. That was the next thing I was going to recommend. The mixing of different CUDA versions is always a headache, especially when they came with their specific driver versions. And the curand error is very symptomatic of that.

TingxunShi commented 4 years ago

Also my friend tried and no error reported, I'll look into it further. Thank you for the patience! DZIĘKUJĘ and take care, especially at the current situation.

snukky commented 4 years ago

On our cluster, where we have multiple versions of CUDA installed on each machine (and often inconsistently across machines), the solution usually is specifying CUDA root directory in the cmake command, for example: -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.1.

emjotde commented 4 years ago

Closing this as this is a configuration problem on the specific machine.