twitter-archive / torch-ipc

A set of primitives for parallel computation in Torch
Apache License 2.0
95 stars 28 forks source link

undefined symbol: cudaEventDestroy #6

Closed vzhong closed 8 years ago

vzhong commented 8 years ago

Hi,

I installed this package as a dependency for torch-dataset. After installing, I can't seem to load up the dataset module due to this problem in ipc:

th> require 'libipc'
/home/victor/torch/install/share/lua/5.1/trepl/init.lua:384: ...e/victor/torch/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libipc' from file '/home/victor/torch/install/lib/lua/5.1/libipc.so':
        /home/victor/torch/install/lib/lua/5.1/libipc.so: undefined symbol: cudaEventDestroy
stack traceback:
        [C]: in function 'error'
        /home/victor/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require'
        [string "_RESULT={require 'libipc'}"]:1: in main chunk
        [C]: in function 'xpcall'
        /home/victor/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl'
        ...ctor/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk
        [C]: at 0x00406690

This is a fresh torch installation. I have cuda 7.0 and nvidia-smi shows the drivers:

 $ nvidia-smi                                                                                                                                                                                                                                                                                                                                             [13:48:01]
Tue Feb  9 13:48:05 2016
+------------------------------------------------------+
| NVIDIA-SMI 346.82     Driver Version: 346.82         |
|-------------------------------+----------------------+----------------------+

Any ideas what might be wrong? Am I missing a required library that defines cudaEventDestroy?

zakattacktwitter commented 8 years ago

Hi,

Can you load cutorch? Just run th (the Torch repl) and type

require ‘cutorch’ cutorch.getDeviceCount()

Does that work?

On Tue, Feb 9, 2016 at 1:49 PM, Victor Zhong notifications@github.com wrote:

Hi,

I installed this package as a dependency for torch-dataset. After installing, I can't seem to load up the dataset module due to this problem in ipc:

th> require 'libipc' /home/victor/torch/install/share/lua/5.1/trepl/init.lua:384: ...e/victor/torch/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libipc' from file '/home/victor/torch/install/lib/lua/5.1/libipc.so': /home/victor/torch/install/lib/lua/5.1/libipc.so: undefined symbol: cudaEventDestroy stack traceback: [C]: in function 'error' /home/victor/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' [string "_RESULT={require 'libipc'}"]:1: in main chunk [C]: in function 'xpcall' /home/victor/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl' ...ctor/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk [C]: at 0x00406690

This is a fresh torch installation. I have cuda 7.0 and nvidia-smi shows the drivers:

$ nvidia-smi [13:48:01] Tue Feb 9 13:48:05 2016 +------------------------------------------------------+ | NVIDIA-SMI 346.82 Driver Version: 346.82 | |-------------------------------+----------------------+----------------------+

Any ideas what might be wrong? Am I missing a required library that defines cudaEventDestroy?

— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6.

vzhong commented 8 years ago

Thanks for helping @zakattacktwitter :)

It does work:

th> require 'cutorch';
th> cutorch.getDeviceCount()
8
zakattacktwitter commented 8 years ago

Did you end up figuring this out?

vzhong commented 8 years ago

@zakattacktwitter No. In addition I reproduced this same problem on my home machine with CUDA as well.

zakattacktwitter commented 8 years ago

Is nvcc in your path? What does which nvcc report?

On Thu, Feb 11, 2016 at 9:29 AM, Victor Zhong notifications@github.com wrote:

@zakattacktwitter https://github.com/zakattacktwitter No. In addition I reproduced this same problem on my home machine with CUDA as well.

— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6#issuecomment-182967563.

vzhong commented 8 years ago

yup

 $ which nvcc                                                                                                                                                           [1:34:15]
/opt/cuda/bin/nvcc
 $ nvcc --version                                                                                                                                                      nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
zakattacktwitter commented 8 years ago

what does which nvcc report?

On Thu, Feb 11, 2016 at 9:31 AM, Victor Zhong notifications@github.com wrote:

yup

$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Tue_Aug_11_14:27:32_CDT_2015 Cuda compilation tools, release 7.5, V7.5.17

— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6#issuecomment-182968145.

vzhong commented 8 years ago
 $ which nvcc                                                                                                                                                          
/opt/cuda/bin/nvcc

Moreover I believe this cudaEventDestroy is defined in include/cuda_runtime_api.h. Could it be possible that this file isn't linked properly during the build process for ipc?

victor@archie: /opt/cuda
 $ grep -R cudaEventDestroy include                                                                                                                                     [1:36:15]
include/cuda_runtime_api.h: * with ::cudaEventDestroy will result in undefined behavior.
include/cuda_runtime_api.h: * ::cudaEventDestroy,
include/cuda_runtime_api.h: * This event must be freed with ::cudaEventDestroy.
include/cuda_runtime_api.h: * been freed with ::cudaEventDestroy will result in undefined behavior.
include/cuda_runtime_api.h: * ::cudaEventDestroy,
include/cuda_runtime_api.h: * any functions (including ::cudaEventRecord() and ::cudaEventDestroy()) may be
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime
include/cuda_runtime_api.h: * ::cudaEventQuery, ::cudaEventDestroy, ::cudaEventElapsedTime
include/cuda_runtime_api.h: * when ::cudaEventDestroy() is called, the function will return immediately and
include/cuda_runtime_api.h:extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaEventDestroy(cudaEvent_t event);
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventRecord
include/thrust/system/cuda/detail/bulk/future.hpp:        cudaError_t e = cudaEventDestroy(m_event);
include/thrust/system/cuda/detail/bulk/future.hpp:          printf("CUDA error after cudaEventDestroy in future dtor: %s", cudaGetErrorString(e));
include/thrust/system/cuda/detail/cub/util_allocator.cuh:                    if (CubDebug(error = cudaEventDestroy(search_key.ready_event))) break;
include/thrust/system/cuda/detail/cub/util_allocator.cuh:            if (CubDebug(error = cudaEventDestroy(begin->ready_event))) break;
include/cuda_device_runtime_api.h:extern __device__ __cudart_builtin__ cudaError_t CUDARTAPI cudaEventDestroy(cudaEvent_t event);
include/cuda_runtime.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
noa commented 8 years ago

Following up on this, I've noticed similar issues with undefined symbols:

CUDA 7.0 (linux): torch/install/lib/lua/5.1/libipc.so: undefined symbol: cudaMemcpyAsync CUDA 7.5 (osx): dlopen(/Users/nico/torch/install/lib/lua/5.1/libipc.so, 6): Symbol not found: _cudaDeviceSynchronize

CUDA / cutorch otherwise appear healthy. I'm using the latest torch / nn / cunn. I haven't yet had time to dig any deeper...

noa commented 8 years ago

Maybe the issue is the CMake version? The CMakeLists.txt uses CUDA_SDK_ROOT_DIR, which is undefined in my version of CMake (3.4). This is set correctly though:

CUDA_TOOLKIT_ROOT_DIR

See: https://cmake.org/cmake/help/v3.0/module/FindCUDA.html

noa commented 8 years ago

This seems to work for me:

INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${Torch_LUA_INCLUDE_DIR})
INCLUDE_DIRECTORIES(${CUDA_INCLUDE_DIRS})
CUDA_ADD_LIBRARY(ipc MODULE ${src})
TARGET_LINK_LIBRARIES(ipc luaT TH THC)

### Torch packages supposes libraries prefix is "lib"
SET_TARGET_PROPERTIES(ipc PROPERTIES
  PREFIX "lib"
  IMPORT_PREFIX "lib"
  INSTALL_NAME_DIR "@executable_path/${Torch_INSTALL_BIN2CPATH}")

IF(APPLE)
  SET_TARGET_PROPERTIES(ipc PROPERTIES
    LINK_FLAGS "-undefined dynamic_lookup")
ENDIF()

INSTALL(TARGETS ipc
  RUNTIME DESTINATION "${Torch_INSTALL_LUA_CPATH_SUBDIR}"
  LIBRARY DESTINATION "${Torch_INSTALL_LUA_CPATH_SUBDIR}")

INSTALL(FILES ${luasrc} DESTINATION "${Torch_INSTALL_LUA_PATH_SUBDIR}/ipc")
ChunyuanLI commented 8 years ago

Does anyone solve the problem? I have the correct CUDA setup, but still have the same issue here...

fmguler commented 8 years ago

Same here (osx). When I install with "CUDA=NO luarocks make ipc-scm-1.rockspec" it works but obviously it doesn't work with CUDA. When I install with "luarocks install ipc" I get "Symbol not found: _cudaDeviceSynchronize" error.

daeyun commented 8 years ago

I had this problem too. I had to set CUDA_TOOLKIT_ROOT_DIR.

Then changed TARGET_LINK_LIBRARIES(ipc luaT TH THC) to TARGET_LINK_LIBRARIES(ipc luaT TH THC ${CUDA_LIBRARIES})

deltheil commented 8 years ago

FWIW I encountered the same problem (archlinux , /opt/cuda/bin/nvcc, CUDA V7.5.17, no problem to load cutorch via th).

If I explicitly link against CUDA libraries (see above) with TARGET_LINK_LIBRARIES(ipc luaT TH THC ${CUDA_LIBRARIES}) then /opt/cuda/lib64/libcudart_static.a is added at link stage:

[ 50%] Linking C shared module libipc.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/ipc.dir/link.txt --verbose=1
/usr/bin/cc  -fPIC -DUSE_CUDA ... /opt/cuda/lib64/libcudart_static.a ...

So thanks to this static linking there is no more problem:

$ nm ~/torch/install/lib/lua/5.1/libipc.so | grep -i cudaEventDestroy
000000000005ac70 t cudaEventDestroy
0000000000069430 r _ZZ16cudaEventDestroyE12__FUNCTION__

And th -e "print(require('libipc'))" works like a charm.

zakattacktwitter commented 8 years ago

Hi,

Glad everyone has found solutions. Can we post one as a pull request so others can take advantage? I don't experience this problem so I can't post a fix.

Thanks, Zak

On Wednesday, April 6, 2016, Cédric Deltheil notifications@github.com wrote:

FWIW I encountered the same problem (archlinux , /opt/cuda/bin/nvcc, CUDA V7.5.17, no problem to load cutorch via th).

If I explicitly link against CUDA libraries (see above) with TARGET_LINK_LIBRARIES(ipc luaT TH THC ${CUDA_LIBRARIES}) then /opt/cuda/lib64/libcudart_static.a is added at link stage:

[ 50%] Linking C shared module libipc.so /usr/bin/cmake -E cmake_link_script CMakeFiles/ipc.dir/link.txt --verbose=1 /usr/bin/cc -fPIC -DUSE_CUDA ... /opt/cuda/lib64/libcudart_static.a ...

So thanks to this static linking there is no more problem:

$ nm ~/torch/install/lib/lua/5.1/libipc.so | grep -i cudaEventDestroy 000000000005ac70 t cudaEventDestroy 0000000000069430 r _ZZ16cudaEventDestroyE12FUNCTION

And th -e "print(require('libipc'))" works like a charm.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6#issuecomment-206401935

deltheil commented 8 years ago

Done: see #21.

zakattacktwitter commented 8 years ago

Pretty sure this is fixed now. Thanks!