CUDA error: no kernel image is available for execution on the device Error from operator: output

CarlosYeverino commented 6 years ago

If you have a question or would like help and support, please ask at our forums.

If you are submitting a feature request, please preface the title with [feature request]. If you are submitting a bug report, please fill in the following details.

Issue description

I compiled whole pythorch with GPU support and console output was successfully compiled. Therefore I was able to get caffe2_pybind11_state.pyd and caffe2_pybind11_state_gpu.pyd.

When I run python the following code without GPU support it succeeds: python char_rnn.py --train_data shakespeare.txt

However, when I run it with GPU I got a CUDA error: python char_rnn.py --train_data shakespeare.txt --gpu

Output from console: D:\yev\git_projects\pytorch\caffe2\python\examples>python char_rnn.py --train_data shakespeare.txt --gpu [E D:\yev\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E D:\yev\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E D:\yev\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. Input has 62 characters. Total input size: 99993 DEBUG:char_rnn:Start training DEBUG:char_rnn:Training model WARNING:caffe2.python.workspace:Original python traceback for operator 0 in network char_rnn_init in exception above (most recent call last): WARNING:caffe2.python.workspace: File "char_rnn.py", line 276, in WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\utils.py", line 329, in wrapper WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\utils.py", line 291, in run WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\utils.py", line 328, in func WARNING:caffe2.python.workspace: File "char_rnn.py", line 270, in main WARNING:caffe2.python.workspace: File "char_rnn.py", line 71, in CreateModel WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\rnn_cell.py", line 1571, in _LSTM WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\rnn_cell.py", line 93, in apply_over_sequence WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\rnn_cell.py", line 491, in prepare_input WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\brew.py", line 107, in scope_wrapper WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\helpers\fc.py", line 58, in fc WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\helpers\fc.py", line 37, in _FC_or_packed_FC WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\model_helper.py", line 214, in create_param WARNING:caffe2.python.workspace: File "D:\yev\git_projects\pytorch\build\caffe2\python\modeling\initializers.py", line 30, in create_param Entering interactive debugger. Type "bt" to print the full stacktrace. Type "help" to see command listing. [enforce fail at context_gpu.h:171] . Encountered CUDA error: no kernel image is available for execution on the device Error from operator: output: "LSTM/i2h_w" name: "" type: "XavierFill" arg { name: "shape" ints: 400 ints: 62 } device_option { device_type: 1 cuda_gpu_id: 0 }

d:\yev\git_projects\pytorch\build\caffe2\python\workspace.py(178)CallWithExceptionIntercept() -> return func(*args, **kwargs) (Pdb)

Code example

python char_rnn.py --train_data shakespeare.txt --gpu

System Info

Please copy and paste the output from our environment collection script (or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py

PyTorch or Caffe2: Pythorch cloned to be able to use Caffe2
How you installed PyTorch (conda, pip, source): github
Build command you used (if compiling from source):
OS: Windows 10
PyTorch version: current version
Python version: 2.7
CUDA/cuDNN version: 9.2/7.1
GPU models and configuration: NVIDIA GeForce GTX 1050
GCC version (if compiling from source):
CMake version: cmake-3.12.0-rc2-win64-x64.msi
Versions of any other relevant libraries: Visual Studio 2017

peterjc123 commented 6 years ago

What is PyTorch 3.5? Did you compile the packages by yourself and installed them on some other machines? If you are doing this, then remember to correctly set TORCH_NVCC_ARCH_LIST. Otherwise, it will be only built for your own graphic card.

CarlosYeverino commented 6 years ago

Hi peterjc123, I just cloned yesterday pythorch to be able to use caffe2. I followed the instructions from https://caffe2.ai/docs/getting-started.html?platform=windows&configuration=compile but installing CUDA 9.2

So, I install it running build_windows.bat on my own machine.

Do I need to do what you suggested before for this case?

peterjc123 commented 6 years ago

Cc. @pjh5 @orionr

ssnl commented 6 years ago

Can you check the cmake log and see which archs it built for?

CarlosYeverino commented 6 years ago

**As peterjc123 suggested, I set TORCH_CUDA_ARCH_LIST. I also change versions to CUDA 8.0, cuDNN 7.0.5 and Visual Studio 2015 since I saw on internet that there is no good compatibility with my previous combination.

However, I got a different error "Caffe2 building failed". How should I set TORCH_CUDA_ARCH_LIST for GeForce GTX 1050? I already tried set TORCH_CUDA_ARCH_LIST="6.1" with no success.**

I did the following sets: D:\Yeverino\git_projects\pytorch\scripts>set CMAKE_GENERATOR="Visual Studio 14 2015 Win64"

D:\Yeverino\git_projects\pytorch\scripts>set USE_CUDA=ON

D:\Yeverino\git_projects\pytorch\scripts>set TORCH_CUDA_ARCH_LIST="6.1"

D:\Yeverino\git_projects\pytorch\scripts>build_windows.bat

Below the Console Output: Requirement already satisfied: pyyaml in c:\python27\lib\site-packages (3.13) CAFFE2_ROOT=D:\Yeverino\git_projects\pytorch\scripts.. CMAKE_GENERATOR="Visual Studio 14 2015 Win64" CMAKE_BUILD_TYPE=Release -- Selecting Windows SDK version 10.0.14393.0 to target Windows 10.0.17134. -- The CXX compiler identification is MSVC 19.0.24215.1 -- The C compiler identification is MSVC 19.0.24215.1 -- Check for working CXX compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- Check for working CXX compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Detecting CXX compile features -- Detecting CXX compile features - done -- Check for working C compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- Check for working C compiler: D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- works -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Detecting C compile features -- Detecting C compile features - done -- Not forcing any particular BLAS to be found -- Performing Test CAFFE2_LONG_IS_INT32_OR_64 -- Performing Test CAFFE2_LONG_IS_INT32_OR_64 - Failed -- Need to define long as a separate typeid. -- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED -- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success -- std::exception_ptr is supported. -- Performing Test CAFFE2_IS_NUMA_AVAILABLE -- Performing Test CAFFE2_IS_NUMA_AVAILABLE - Failed -- NUMA is not available -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING -- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS -- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success -- Current compiler supports avx2 extention. Will build perfkernels. -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY -- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed -- Building using own protobuf under third_party per request. -- Use custom protobuf build. -- Looking for pthread.h -- Looking for pthread.h - not found -- Found Threads: TRUE -- Caffe2 protobuf include directory: $<BUILD_INTERFACE:D:/Yeverino/git_projects/pytorch/third_party/protobuf/src>$ -- Found Git: D:/Program Files/Git/Git/cmd/git.exe (found version "2.18.0.windows.1") -- The BLAS backend of choice:Eigen CMake Warning at cmake/Dependencies.cmake:257 (message): NUMA is currently only supported under Linux. Call Stack (most recent call first): CMakeLists.txt:181 (include)

CMake Warning at cmake/Dependencies.cmake:330 (find_package): By not providing "FindEigen3.cmake" in CMAKE_MODULE_PATH this project has asked CMake to find a package configuration file provided by "Eigen3", but CMake did not find one.

Could not find a package configuration file provided by "Eigen3" with any of the following names:

Eigen3Config.cmake eigen3-config.cmake

Add the installation prefix of "Eigen3" to CMAKE_PREFIX_PATH or set "Eigen3_DIR" to a directory containing one of the above files. If "Eigen3" provides a separate development package or SDK, be sure it has been installed. Call Stack (most recent call first): CMakeLists.txt:181 (include)

-- Did not find system Eigen. Using third party subdirectory. -- Found PythonInterp: C:/Python27/python.exe (found suitable version "2.7.14", minimum required is "2.7") -- Found PythonLibs: C:/Python27/libs/python27.lib (found suitable version "2.7.14", minimum required is "2.7") -- Found NumPy: C:/Python27/lib/site-packages/numpy/core/include (found version "1.14.5") -- NumPy ver. 1.14.5 found (include: C:/Python27/lib/site-packages/numpy/core/include) -- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR) -- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) -- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS) -- Could NOT find MPI (missing: MPI_C_FOUND MPI_CXX_FOUND) CMake Warning at cmake/Dependencies.cmake:401 (message): Not compiling with MPI. Suppress this warning with -DUSE_MPI=OFF Call Stack (most recent call first): CMakeLists.txt:181 (include)

-- Found CUDA: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 (found suitable version "8.0", minimum required is "7.0") -- Caffe2: CUDA detected: 8.0 -- Caffe2: CUDA nvcc is: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/bin/nvcc.exe -- Caffe2: CUDA toolkit directory: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 -- Caffe2: Header version is: 8.0 -- Found CUDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include -- Found cuDNN: v7.0.5 (include: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include, library: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib) CMake Warning at cmake/public/utils.cmake:148 (message): In the future we will require one to explicitly pass TORCH_CUDA_ARCH_LIST to cmake instead of implicitly setting it as an env variable. This will become a FATAL_ERROR in future version of pytorch. Call Stack (most recent call first): cmake/public/cuda.cmake:332 (torch_cuda_get_nvcc_gencode_flag) cmake/Dependencies.cmake:433 (include) CMakeLists.txt:181 (include)

CMake Error at cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake:168 (message): Unknown CUDA Architecture Name "6.1" in CUDA_SELECT_NVCC_ARCH_FLAGS Call Stack (most recent call first): cmake/public/utils.cmake:164 (cuda_select_nvcc_arch_flags) cmake/public/cuda.cmake:332 (torch_cuda_get_nvcc_gencode_flag) cmake/Dependencies.cmake:433 (include) CMakeLists.txt:181 (include)

CMake Error at cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake:172 (message): arch_bin wasn't set for some reason Call Stack (most recent call first): cmake/public/utils.cmake:164 (cuda_select_nvcc_arch_flags) cmake/public/cuda.cmake:332 (torch_cuda_get_nvcc_gencode_flag) cmake/Dependencies.cmake:433 (include) CMakeLists.txt:181 (include)

-- Added CUDA NVCC flags for: CMake Warning at cmake/Dependencies.cmake:543 (message): NCCL is currently only supported under Linux. Call Stack (most recent call first): CMakeLists.txt:181 (include)

-- Could NOT find CUB (missing: CUB_INCLUDE_DIR) CMake Warning at cmake/Dependencies.cmake:563 (message): Gloo can only be used on Linux. Call Stack (most recent call first): CMakeLists.txt:181 (include)

CMake Warning at cmake/Dependencies.cmake:623 (message): mobile opengl is only used in android or ios builds. Call Stack (most recent call first): CMakeLists.txt:181 (include)

CMake Warning at cmake/Dependencies.cmake:699 (message): Metal is only used in ios builds. Call Stack (most recent call first): CMakeLists.txt:181 (include)

-- NCCL operators skipped due to no CUDA support -- Excluding ideep operators as we are not using ideep -- Excluding image processing operators due to no opencv -- Excluding video processing operators due to no opencv -- Excluding mkl operators as we are not using mkl -- MPI operators skipped due to no MPI support -- Include Observer library -- Using Lib\site-packages as python relative installation path -- Automatically generating missing init.py files. CMake Warning at CMakeLists.txt:341 (message): Generated cmake files are only fully tested if one builds with system glog, gflags, and protobuf. Other settings may generate files that are not well tested.

CMake Warning at CMakeLists.txt:390 (message): Generated cmake files are only available when building shared libs.

-- -- Summary -- General: -- CMake version : 3.12.0-rc2 -- CMake command : C:/Program Files/CMake/bin/cmake.exe -- Git version : v0.1.11-9211-gf87499a8f-dirty -- System : Windows -- C++ compiler : D:/Program Files/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe -- C++ compiler version : 19.0.24215.1 -- BLAS : Eigen -- CXX flags : /DWIN32 /D_WINDOWS /W3 /GR /EHsc -DONNX_NAMESPACE=onnx_c2 /MP /bigobj -- Build type : Release -- Compile definitions : -- CMAKE_PREFIX_PATH : -- CMAKE_INSTALL_PREFIX : C:/Program Files/Caffe2

-- BUILD_CAFFE2 : ON -- BUILD_ATEN : OFF -- BUILD_BINARY : ON -- BUILD_CUSTOM_PROTOBUF : ON -- Protobuf compiler : -- Protobuf includes : -- Protobuf libraries : -- BUILD_DOCS : OFF -- BUILD_PYTHON : ON -- Python version : 2.7.14 -- Python includes : C:/Python27/include -- BUILD_SHARED_LIBS : OFF -- BUILD_TEST : OFF -- USE_ASAN : OFF -- USE_ATEN : OFF -- USE_CUDA : ON -- CUDA static link : OFF -- USE_CUDNN : ON -- CUDA version : 8.0 -- cuDNN version : 7.0.5 -- CUDA root directory : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0 -- CUDA library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cuda.lib -- cudart library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudart_static.lib -- cublas library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cublas.lib;C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cublas_device.lib -- cufft library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cufft.lib -- curand library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/curand.lib -- cuDNN library : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/cudnn.lib -- nvrtc : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64/nvrtc.lib -- CUDA include path : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/include -- NVCC executable : C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/bin/nvcc.exe -- CUDA host compiler : $(VCInstallDir)bin -- USE_TENSORRT : OFF -- USE_ROCM : OFF -- USE_EIGEN_FOR_BLAS : ON -- USE_FFMPEG : OFF -- USE_GFLAGS : OFF -- USE_GLOG : OFF -- USE_GLOO : OFF -- USE_LEVELDB : OFF -- USE_LITE_PROTO : OFF -- USE_LMDB : OFF -- USE_METAL : OFF -- USE_MKL : -- USE_MOBILE_OPENGL : OFF -- USE_MPI : OFF -- USE_NCCL : OFF -- USE_NERVANA_GPU : OFF -- USE_NNPACK : OFF -- USE_OBSERVERS : ON -- USE_OPENCL : OFF -- USE_OPENCV : OFF -- USE_OPENMP : OFF -- USE_PROF : OFF -- USE_REDIS : OFF -- USE_ROCKSDB : OFF -- USE_ZMQ : OFF -- Public Dependencies : Threads::Threads -- Private Dependencies : cpuinfo;onnxifi_loader -- Configuring incomplete, errors occurred! See also "D:/Yeverino/git_projects/pytorch/build/CMakeFiles/CMakeOutput.log". See also "D:/Yeverino/git_projects/pytorch/build/CMakeFiles/CMakeError.log". "Caffe2 building failed"

peterjc123 commented 6 years ago

@CarlosYeverino Remove the quotation marks and try again. That is:

set TORCH_CUDA_ARCH_LIST=6.1

CarlosYeverino commented 6 years ago

@peterjc123 thanks, man. That solves my issue. Do you know can I fix "It means you may not get the full speed of your CPU"?

[E D:\Yeverino\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E D:\Yeverino\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E D:\Yeverino\git_projects\pytorch\caffe2\core\init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

peterjc123 commented 6 years ago

@CarlosYeverino Can you please upload your CMakeCache.txt so that we can check if the flag is correctly set or something else is wrong there?

CarlosYeverino commented 6 years ago

Here is the CMakeCache.

CMakeCache.txt

peterjc123 commented 6 years ago

@CarlosYeverino Here is the related cmake script. It states that the AVX2 switch was turned off due to a link error. Maybe @orionr and @pjh5 know the details.

if (CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS)
  message(STATUS "Current compiler supports avx2 extention. Will build perfkernels.")
  # Currently MSVC seems to have a symbol not found error while linking (related
  # to source file order?). As a result we will currently disable the perfkernel
  # in msvc.
  # Also see CMakeLists.txt under caffe2/perfkernels.
  if (NOT MSVC)
    set(CAFFE2_PERF_WITH_AVX 1)
    set(CAFFE2_PERF_WITH_AVX2 1)
  endif()
endif()

orionr commented 6 years ago

The if (NOT MSVC) check above was added by @Yangqing, so cc'ing him as well. @CarlosYeverino, feel free to remove that test and see if it works for you.

In general, though, depending on what operators you use in your network, most will be implemented in CUDA and run on the GPU, so AVX/AVX2 won't matter from a performance standpoint. Happy to hear you got CUDA working!

pytorch / pytorch