Closed vzhong closed 8 years ago
Hi,
Can you load cutorch? Just run th (the Torch repl) and type
require ‘cutorch’ cutorch.getDeviceCount()
Does that work?
On Tue, Feb 9, 2016 at 1:49 PM, Victor Zhong notifications@github.com wrote:
Hi,
I installed this package as a dependency for torch-dataset. After installing, I can't seem to load up the dataset module due to this problem in ipc:
th> require 'libipc' /home/victor/torch/install/share/lua/5.1/trepl/init.lua:384: ...e/victor/torch/install/share/lua/5.1/luarocks/loader.lua:117: error loading module 'libipc' from file '/home/victor/torch/install/lib/lua/5.1/libipc.so': /home/victor/torch/install/lib/lua/5.1/libipc.so: undefined symbol: cudaEventDestroy stack traceback: [C]: in function 'error' /home/victor/torch/install/share/lua/5.1/trepl/init.lua:384: in function 'require' [string "_RESULT={require 'libipc'}"]:1: in main chunk [C]: in function 'xpcall' /home/victor/torch/install/share/lua/5.1/trepl/init.lua:651: in function 'repl' ...ctor/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:199: in main chunk [C]: at 0x00406690
This is a fresh torch installation. I have cuda 7.0 and nvidia-smi shows the drivers:
$ nvidia-smi [13:48:01] Tue Feb 9 13:48:05 2016 +------------------------------------------------------+ | NVIDIA-SMI 346.82 Driver Version: 346.82 | |-------------------------------+----------------------+----------------------+
Any ideas what might be wrong? Am I missing a required library that defines cudaEventDestroy?
— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6.
Thanks for helping @zakattacktwitter :)
It does work:
th> require 'cutorch';
th> cutorch.getDeviceCount()
8
Did you end up figuring this out?
@zakattacktwitter No. In addition I reproduced this same problem on my home machine with CUDA as well.
Is nvcc in your path? What does which nvcc
report?
On Thu, Feb 11, 2016 at 9:29 AM, Victor Zhong notifications@github.com wrote:
@zakattacktwitter https://github.com/zakattacktwitter No. In addition I reproduced this same problem on my home machine with CUDA as well.
— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6#issuecomment-182967563.
yup
$ which nvcc [1:34:15]
/opt/cuda/bin/nvcc
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17
what does which nvcc
report?
On Thu, Feb 11, 2016 at 9:31 AM, Victor Zhong notifications@github.com wrote:
yup
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2015 NVIDIA Corporation Built on Tue_Aug_11_14:27:32_CDT_2015 Cuda compilation tools, release 7.5, V7.5.17
— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6#issuecomment-182968145.
$ which nvcc
/opt/cuda/bin/nvcc
Moreover I believe this cudaEventDestroy
is defined in include/cuda_runtime_api.h
. Could it be possible that this file isn't linked properly during the build process for ipc
?
victor@archie: /opt/cuda
$ grep -R cudaEventDestroy include [1:36:15]
include/cuda_runtime_api.h: * with ::cudaEventDestroy will result in undefined behavior.
include/cuda_runtime_api.h: * ::cudaEventDestroy,
include/cuda_runtime_api.h: * This event must be freed with ::cudaEventDestroy.
include/cuda_runtime_api.h: * been freed with ::cudaEventDestroy will result in undefined behavior.
include/cuda_runtime_api.h: * ::cudaEventDestroy,
include/cuda_runtime_api.h: * any functions (including ::cudaEventRecord() and ::cudaEventDestroy()) may be
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime
include/cuda_runtime_api.h: * ::cudaEventQuery, ::cudaEventDestroy, ::cudaEventElapsedTime
include/cuda_runtime_api.h: * when ::cudaEventDestroy() is called, the function will return immediately and
include/cuda_runtime_api.h:extern __host__ __cudart_builtin__ cudaError_t CUDARTAPI cudaEventDestroy(cudaEvent_t event);
include/cuda_runtime_api.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventRecord
include/thrust/system/cuda/detail/bulk/future.hpp: cudaError_t e = cudaEventDestroy(m_event);
include/thrust/system/cuda/detail/bulk/future.hpp: printf("CUDA error after cudaEventDestroy in future dtor: %s", cudaGetErrorString(e));
include/thrust/system/cuda/detail/cub/util_allocator.cuh: if (CubDebug(error = cudaEventDestroy(search_key.ready_event))) break;
include/thrust/system/cuda/detail/cub/util_allocator.cuh: if (CubDebug(error = cudaEventDestroy(begin->ready_event))) break;
include/cuda_device_runtime_api.h:extern __device__ __cudart_builtin__ cudaError_t CUDARTAPI cudaEventDestroy(cudaEvent_t event);
include/cuda_runtime.h: * ::cudaEventSynchronize, ::cudaEventDestroy, ::cudaEventElapsedTime,
Following up on this, I've noticed similar issues with undefined symbols:
CUDA 7.0 (linux): torch/install/lib/lua/5.1/libipc.so: undefined symbol: cudaMemcpyAsync CUDA 7.5 (osx): dlopen(/Users/nico/torch/install/lib/lua/5.1/libipc.so, 6): Symbol not found: _cudaDeviceSynchronize
CUDA / cutorch otherwise appear healthy. I'm using the latest torch / nn / cunn. I haven't yet had time to dig any deeper...
Maybe the issue is the CMake version? The CMakeLists.txt uses CUDA_SDK_ROOT_DIR, which is undefined in my version of CMake (3.4). This is set correctly though:
CUDA_TOOLKIT_ROOT_DIR
This seems to work for me:
INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR})
INCLUDE_DIRECTORIES(${Torch_LUA_INCLUDE_DIR})
INCLUDE_DIRECTORIES(${CUDA_INCLUDE_DIRS})
CUDA_ADD_LIBRARY(ipc MODULE ${src})
TARGET_LINK_LIBRARIES(ipc luaT TH THC)
### Torch packages supposes libraries prefix is "lib"
SET_TARGET_PROPERTIES(ipc PROPERTIES
PREFIX "lib"
IMPORT_PREFIX "lib"
INSTALL_NAME_DIR "@executable_path/${Torch_INSTALL_BIN2CPATH}")
IF(APPLE)
SET_TARGET_PROPERTIES(ipc PROPERTIES
LINK_FLAGS "-undefined dynamic_lookup")
ENDIF()
INSTALL(TARGETS ipc
RUNTIME DESTINATION "${Torch_INSTALL_LUA_CPATH_SUBDIR}"
LIBRARY DESTINATION "${Torch_INSTALL_LUA_CPATH_SUBDIR}")
INSTALL(FILES ${luasrc} DESTINATION "${Torch_INSTALL_LUA_PATH_SUBDIR}/ipc")
Does anyone solve the problem? I have the correct CUDA setup, but still have the same issue here...
Same here (osx). When I install with "CUDA=NO luarocks make ipc-scm-1.rockspec" it works but obviously it doesn't work with CUDA. When I install with "luarocks install ipc" I get "Symbol not found: _cudaDeviceSynchronize" error.
I had this problem too. I had to set CUDA_TOOLKIT_ROOT_DIR.
Then changed TARGET_LINK_LIBRARIES(ipc luaT TH THC)
to TARGET_LINK_LIBRARIES(ipc luaT TH THC ${CUDA_LIBRARIES})
FWIW I encountered the same problem (archlinux , /opt/cuda/bin/nvcc
, CUDA V7.5.17
, no problem to load cutorch
via th
).
If I explicitly link against CUDA libraries (see above) with TARGET_LINK_LIBRARIES(ipc luaT TH THC ${CUDA_LIBRARIES})
then /opt/cuda/lib64/libcudart_static.a
is added at link stage:
[ 50%] Linking C shared module libipc.so
/usr/bin/cmake -E cmake_link_script CMakeFiles/ipc.dir/link.txt --verbose=1
/usr/bin/cc -fPIC -DUSE_CUDA ... /opt/cuda/lib64/libcudart_static.a ...
So thanks to this static linking there is no more problem:
$ nm ~/torch/install/lib/lua/5.1/libipc.so | grep -i cudaEventDestroy
000000000005ac70 t cudaEventDestroy
0000000000069430 r _ZZ16cudaEventDestroyE12__FUNCTION__
And th -e "print(require('libipc'))"
works like a charm.
Hi,
Glad everyone has found solutions. Can we post one as a pull request so others can take advantage? I don't experience this problem so I can't post a fix.
Thanks, Zak
On Wednesday, April 6, 2016, Cédric Deltheil notifications@github.com wrote:
FWIW I encountered the same problem (archlinux , /opt/cuda/bin/nvcc, CUDA V7.5.17, no problem to load cutorch via th).
If I explicitly link against CUDA libraries (see above) with TARGET_LINK_LIBRARIES(ipc luaT TH THC ${CUDA_LIBRARIES}) then /opt/cuda/lib64/libcudart_static.a is added at link stage:
[ 50%] Linking C shared module libipc.so /usr/bin/cmake -E cmake_link_script CMakeFiles/ipc.dir/link.txt --verbose=1 /usr/bin/cc -fPIC -DUSE_CUDA ... /opt/cuda/lib64/libcudart_static.a ...
So thanks to this static linking there is no more problem:
$ nm ~/torch/install/lib/lua/5.1/libipc.so | grep -i cudaEventDestroy 000000000005ac70 t cudaEventDestroy 0000000000069430 r _ZZ16cudaEventDestroyE12FUNCTION
And th -e "print(require('libipc'))" works like a charm.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/6#issuecomment-206401935
Done: see #21.
Pretty sure this is fixed now. Thanks!
Hi,
I installed this package as a dependency for
torch-dataset
. After installing, I can't seem to load up thedataset
module due to this problem inipc
:This is a fresh torch installation. I have cuda 7.0 and
nvidia-smi
shows the drivers:Any ideas what might be wrong? Am I missing a required library that defines
cudaEventDestroy
?