warmspringwinds / pytorch-cpp

Pytorch C++ Library
369 stars 68 forks source link

ATen CUDA error #18

Closed pharrellyhy closed 6 years ago

pharrellyhy commented 6 years ago

Hi @warmspringwinds ,

After I switched to CUDA 8.0, I still can't compile ATen cloned from your repo while I can do this from the original master repo.

After compiled it, I got two .so file which is cuda and cpu version. Then I did a very simple test:

cout << ones(CUDA(kFloat), {3,4}) << "\n";

and I got below error message:

cannot initialize CUDA without ATen_cuda library (initCUDA at /home/pharrell/codebase/github/pytorch-cpp/ATen/aten/src/ATen/detail/CUDAHooksInterface.h:42) frame #0: at::Context::lazyInitCUDA()::{lambda()#1}::operator()() const + 0x32 (0x409a56 in ./test_aten) frame #1: void std::_Bind_simple<at::Context::lazyInitCUDA()::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) + 0x28 (0x40afb0 in ./test_aten) frame #2: std::_Bind_simple<at::Context::lazyInitCUDA()::{lambda()#1} ()>::operator()() + 0x2c (0x40ace0 in ./test_aten) frame #3: void std::__once_call_impl<std::_Bind_simple<at::Context::lazyInitCUDA()::{lambda()#1} ()> >() + 0x17 (0x40a7a4 in ./test_aten) frame #4: + 0xea99 (0x7feec5bdda99 in /lib/x86_64-linux-gnu/libpthread.so.0) frame #5: ./test_aten() [0x408ccd] frame #6: void std::call_once<at::Context::lazyInitCUDA()::{lambda()#1}>(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&) + 0x77 (0x40a0f7 in ./test_aten) frame #7: at::Context::lazyInitCUDA() + 0x3d (0x409b37 in ./test_aten) frame #8: at::Context::initCUDAIfNeeded(at::Backend) + 0x21 (0x409b81 in ./test_aten) frame #9: at::Context::getTypeOpt(at::Backend, at::ScalarType) + 0x23 (0x40984b in ./test_aten) frame #10: at::Context::getType(at::Backend, at::ScalarType) + 0x4a (0x40992a in ./test_aten) frame #11: ./test_aten() [0x408da9] frame #12: ./test_aten() [0x408de1] frame #13: main + 0x65a (0x409467 in ./test_aten) frame #14: __libc_start_main + 0xf0 (0x7feec3e2e830 in /lib/x86_64-linux-gnu/libc.so.6) frame #15: _start + 0x29 (0x408bb9 in ./test_aten)

Aborted (core dumped)

​I've already added those .so in the CMakeList which is TARGET_LINK_LIBRARIES(test_aten ${CUDA_LIBRARIES} ${ATen_BINARY_DIR}/src/ATen/libATen_cuda.so ${ATen_BINARY_DIR}/src/ATen/libATen_cpu.so).

The cpu part is working fine. ​ ​Do you have any thoughts on this problem? Thanks!​

warmspringwinds commented 6 years ago

@pharrellyhy -- cool now that you have cuda 8, hopefully we can resolve it.

I have tested the library recently with cuda 8 and everything worked. I suggest that you try to compile my fork of Aten with cuda 8. To make sure, that you didn't miss something with cmake, try using ccmake (I remember that you don't have cmake-gui installed).

pharrellyhy commented 6 years ago

Hi @warmspringwinds ,

Thank you for your kind reply. Yeah, I finally compiled it... Cheers! I can also run your example so that is a good start.

I also notice that, for the resnet18, if the input size is (1, 3, 112, 112), the forward pass took about 1ms on CUDA, while if the input size is (20, 3, 112, 112), it took about 30ms for one forward pass which looks like it did not run parallel. Any idea which part could be the reason?

warmspringwinds commented 6 years ago

Great!

Closing this then :)

Regarding your question -- I think pytorch itself exhibits this kind of behavior (would be cool if you could check it).