microsoft / vcpkg

C++ Library Manager for Windows, Linux, and MacOS
MIT License
22.99k stars 6.34k forks source link

vpkg pretends CUDA SDK is not installed, but it is. #3609

Closed akors closed 5 years ago

akors commented 6 years ago

Hi, I have tried to install the CUDA SDK via vcpkg, but the installation failed:

CMake Error at vcpkg/ports/cuda/portfile.cmake:26 (message):
  Could not find CUDA.  Before continuing, please download and install CUDA
  (V9.0.0 or higher) from:

      https://developer.nvidia.com/cuda-downloads

  Also ensure vcpkg has been rebuilt with the latest version (v0.0.104 or
  later)
Call Stack (most recent call first):
  vcpkg/scripts/ports.cmake:72 (include)

Error: Building package cuda:x86-windows failed with: BUILD_FAILED

The thing is, I have the CUDA SDK installed in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0. The toolkit is on the PATH (i can call nvcc just fine), and it is registered in the registry:

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPU Computing Toolkit\CUDA\v9.0

I don't really know why vcpkg thinks it's not there.

What's notable is that I have two additional CUDA SDK's installed: v8.0 and v9.1. Is vcpkg maybe confused by that?

Unfortunately I couldn't find any verbose switches or log files that I could attach.

My VCPKG version is: Vcpkg package management program version 0.0.113-nohash My Windows version is: Windows 10 Pro version 1709

MVoz commented 6 years ago

as an option, you can symbolic links, replacing version 9

akors commented 6 years ago

@Voskrese

as an option, you can symbolic links, replacing version 9

Symbolic link from where to where?

carmangary commented 6 years ago

I had this problem once. I reinstalled vcpkg in a different directory and it went away. I never figured out what was wrong but I haven't had the problem since.

akors commented 6 years ago

So I figured out the problem, and it seems to be a change in vcpkg behaviour since the CUDA port file was written.

Here is what the port file tries to do:

find_program(NVCC
    NAMES nvcc nvcc.exe
    PATHS
      ENV CUDA_PATH
      ENV CUDA_BIN_PATH
    PATH_SUFFIXES bin bin64
    DOC "Toolkit location."
    NO_DEFAULT_PATH
    )

It looks in the environment variables CUDA_PATH, CUDA_BIN_PATH and in some CMake default locations, like in the PATH environment variable.

This is nice and all, but apparently vcpkg goes out of its way to delete all environment variables and reset them to some minimal values, presumably for a "clean build environment". So vcpkg cleans the environment so well that the port can't ever work, because no environment variables that could contain the CUDA path ever reach the portfile.

I don't know what the solution is, but I'm pinging @ras0219-msft and @jasjuang because they seem to be the ones responsible for the CUDA port.

In the mean time, I suggest the following workaround for my fellow users: Just hardcode the CUDA path in the portfile, by adding this:

set(ENV{CUDA_BIN_PATH} "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0/bin")

To the top of the file ports\cuda\portfile.cmake

jasjuang commented 6 years ago

CUDA_PATH is actually whitelisted as shown in https://github.com/Microsoft/vcpkg/blob/3fc54807cb9b0bf242f8a2e7d1035bdbbd591aac/toolsrc/src/vcpkg/base/system.cpp . I suppose you need CUDA_BIN_PATH to be whitelisted too in order for it to work.

However when I install CUDA 9.2, CUDA_PATH got created and I don't actually have a CUDA_BIN_PATH. How did you get CUDA_BIN_PATH at the first place?

akors commented 6 years ago

However when I install CUDA 9.2, CUDA_PATH got created and I don't actually have a CUDA_BIN_PATH. How did you get CUDA_BIN_PATH at the first place?

I didn't, I set it manually in the command prompt after I saw that it's being queried in the portfile.

In fact, I installed CUDA 9.0 and 9.1, and I have neither CUDA_BIN_PATH nor CUDA_PATH. set in my system. I only have CUDA_PATH_V9_0 and CUDA_PATH_V9_1 set by default.

So it seems to me the wrong envvars are being queried.

Besides, why do you need to write this whole CUDA checking code anyway? Since version 3.8, CMake has native support for the CUDA language and all this "finding CUDA SDK" stuff is builtin already, no need to rewrite. So I would suggest to use the CheckLanguage module of CMake to find the CUDA SDK.

jasjuang commented 6 years ago

@akors That's actually a great idea. Since I am not too familiar with the CheckLanguage module, do you mind submit a PR for this?

akors commented 6 years ago

@jasjuang It should be as simple as this:

include(CheckLanguage)
check_language(CUDA)

if(CMAKE_CUDA_COMPILER)
    message(STATUS "Found CUDA compiler: ${CMAKE_CUDA_COMPILER}")
else(CMAKE_CUDA_COMPILER)
    message(FATAL_ERROR "CUDA compiler not found")
endif()

Unfortunately, that doesn't work because CMake can't find anything by default, even when nvcc.exe is on the path. I created a CMake issue for that: https://gitlab.kitware.com/cmake/cmake/issues/18059

Also, as much as a "reproducible build environment" is a noble goal, if vcpkg does something like linking extarnally installed tools, the installation of these tools needs to find a way to communicate its location to the vcpkg port.

This is a bit hard, since all environment variables are nuked and even the PATH is completely reset. CUDA_PATH seems to be a nonstandard variable, or at least it didn't get set on my machine when I installed the toolkits.

So I don't really know where to go from here, sorry.

leilaShen commented 6 years ago

So I figured out the problem, and it seems to be a change in vcpkg behaviour since the CUDA port file was written.

Here is what the port file tries to do:

find_program(NVCC
    NAMES nvcc nvcc.exe
    PATHS
      ENV CUDA_PATH
      ENV CUDA_BIN_PATH
    PATH_SUFFIXES bin bin64
    DOC "Toolkit location."
    NO_DEFAULT_PATH
    )

It looks in the environment variables CUDA_PATH, CUDA_BIN_PATH and in some CMake default locations, like in the PATH environment variable.

This is nice and all, but apparently vcpkg goes out of its way to delete all environment variables and reset them to some minimal values, presumably for a "clean build environment". So vcpkg cleans the environment so well that the port can't ever work, because no environment variables that could contain the CUDA path ever reach the portfile.

I don't know what the solution is, but I'm pinging @ras0219-msft and @jasjuang because they seem to be the ones responsible for the CUDA port.

In the mean time, I suggest the following workaround for my fellow users: Just hardcode the CUDA path in the portfile, by adding this:

set(ENV{CUDA_BIN_PATH} "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.0/bin")

To the top of the file ports\cuda\portfile.cmake

This works for me. Thanks a lot!

ZbigniewRA commented 5 years ago

I have the same problem on Ubuntu 18.04 with CUDA 10.1. May I suggest temporarily removing "Build-Depends: cuda" from the CONTROL file as a solution?

"cuda" is a "tag" port, that is used to inform users that they need to set up CUDA themselves. But it has issues now (discussed above), that prevent users from using libraries that would otherwise install and work fine. Until those issues will be figured out, maybe it's better to remove the dependency on it? Or in fact temporarily make "cuda" port install always successfully, and instead produce a warning if it can't find CUDA?

JackBoosY commented 5 years ago

Hi everyone, Thanks for reporting this issue! I can't reproduce this issue on the latest version, maybe someone has fixed it. Please install cuda_10.1.168_425.25_win10.exe, update vcpkg and rebuild cuda.

Thanks.

JackBoosY commented 5 years ago

Duplicate with #2814.

ZbigniewRA commented 5 years ago

This bug is not fixed. Please note that I tested it on Ubuntu 18.04, not on Windows. It is very likely that it works on Windows only.

JackBoosY commented 5 years ago

@ZbigniewRA This issue only reports build errors in Windows, please open a new issue to report build errors in ubuntu.

Thanks.

carmangary commented 5 years ago

@ZbigniewRA This issue only reports build errors in Windows, please open a new issue to report build errors in ubuntu.

Thanks.

I recommend performing testing and regression testing on all supported OSes before closing defects. It will increase the quality of vcpkg and avoid filing this situation.

JackBoosY commented 5 years ago

@carmangary Thanks for your advice.

flamxi commented 6 months ago

to this date, this bug still exists :( and also this is still the fix https://github.com/microsoft/vcpkg/issues/3609#issuecomment-393691746

krjakbrjak commented 6 months ago

to this date, this bug still exists :( and also this is still the fix #3609 (comment)

Instead of changing the portfile, the CUDA_PATH or CUDA_TOOLKIT_ROOT_DIR environment variables can be set since they are whitelisted by vcpkg. See system.process.cpp for more details.