libtorch无法使用gpu & mkl 哈希值不匹配

jalaxy33 commented 1 month ago

Xmake Version

v2.9.5+HEAD.d30de52e9

Operating System Version and Architecture

Windows 11 23H2

Describe Bug

尝试编译 xmake-repo 中的 libtorch，遇到如下问题。

问题一：libtorch 无法使用 GPU

尝试编译 cuda 版 libtorch：

add_requires("libtorch", {configs = {shared=true, cuda=true}})

编译通过，但是 torch::cuda::is_available() 返回 false，即无法使用 gpu。

查询网上的解决方法，尝试添加链接器选项，无效：

add_ldflags("/INCLUDE:?warp_size@cuda@at@@YAHXZ", {force = true})
add_ldflags("/INCLUDE:?ignore_this_library_placeholder@@YAHXZ", {force = true})

目前，使用从官网下载的预编译 libtorch 库可以使用 gpu，但是希望直接使用 xmake-repo 中的 libtorch。

问题二：mkl 库 SHA 值不匹配

修改选项 blas='mkl'：

add_requires("libtorch", {configs = {shared=true, cuda=true, blas = 'mkl'}})

该选项需要 mkl 库。编译报错，显示 mkl 库的 SHA 值不匹配：

error: unmatched checksum, current hash(48c71fa0) != original hash(e760103a)

Expected Behavior

能够使用 gpu 版的 libtorch，torch::cuda::is_available() 返回 true。
解决 mkl 库的哈希值不匹配问题，顺利下载编译 mkl 库。

Project Configuration

xmake.lua

add_rules("mode.debug", "mode.release", "mode.releasedbg")
add_rules("plugin.compile_commands.autoupdate", {outputdir = ".vscode"})

set_runtimes("MD")
set_defaultmode("releasedbg")

-- include libtorch
local use_local_libtorch = false  
local local_libtorch_path = "D:/3rdLibs/packages/libtorch/libtorch-win-shared-with-deps-2.4.1+cu124/libtorch"
local local_libtorch_debug_path = "D:/3rdLibs/packages/libtorch/libtorch-win-shared-with-deps-debug-2.4.1+cu124/libtorch"

if use_local_libtorch then 
    local libtorch_build_type = "Release"
    if is_mode("debug") then
        local_libtorch_path = local_libtorch_debug_path
        libtorch_build_type = "Debug"
    end
    add_requires("cmake::Torch", {alias = "libtorch", system = true, 
                    configs = {envs = {CMAKE_PREFIX_PATH = local_libtorch_path}, 
                            presets = {CMAKE_BUILD_TYPE = libtorch_build_type}}})
else
    add_requires("libtorch", {configs = {shared=true, cuda=true, blas="mkl"}}) --TODO: 能编译但是windows下无法使用 gpu
end

add_requires("cuda")

set_languages("cxxlatest")

target("main")
    set_kind("binary")
    add_files("src/main.cpp")
    add_packages("libtorch")
    if is_plat("windows") then
        add_ldflags("/INCLUDE:?warp_size@cuda@at@@YAHXZ", {force = true})
        add_ldflags("/INCLUDE:?ignore_this_library_placeholder@@YAHXZ", {force = true})
    end
    set_default(true)

测试代码 main.cpp：

#include <iostream>
#include <torch/script.h>
#include <torch/torch.h>

int main(int argc, char **argv) {
    std::cout << "hello world!" << std::endl;

    if (torch::cuda::is_available()) {
        std::cout << "Using cuda" << std::endl;
    } else {
        std::cout << "Using cpu" << std::endl;
    }

    return 0;
}

预期输出：

hello world!
Using cuda

Additional Information and Error Logs

> xmake -v

checking for cl.exe ... D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\HostX64\x64\cl.exe
checking for Microsoft Visual Studio (x64) version ... 2022
checking for Microsoft C/C++ Compiler (x64) version ... 19.41.34120
checking for zig ... no
checking for zig ... no
checking for nim ... no
checking for nim ... no
checking for git ... ok
checking for gzip ... no
checking for 7z ... D:\Scoop\ScoopApps\apps\xmake\current\winenv\bin\7z
git rev-parse HEAD
checking for cl.exe ... D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\HostX64\x64\cl.exe
checking for the c compiler (cc) ... cl.exe
checking for cl.exe ... D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\HostX64\x64\cl.exe
checking for the c++ compiler (cxx) ... cl.exe
checking for Cuda SDK directory ... D:\Program Files\NVIDIA\CUDA\v12.6
checking for nvfortran ... no
checking for the fortran compiler (fc: nvfortran) ... no
checking for gfortran ... no
checking for g95 ... no
checking for the fortran compiler (fc: gfortran) ... no
checking for cmake ... no
checking for cmake ... no
checking for cmake ... no
checking for cmake ... ok
checking for python ... no
checking for python3 ... no
checking for python ... no
checking for python2 ... no
checking for python ... no
checking for ninja ... no
checking for ninja ... no
checking for ninja ... no
checking for ninja ... no
checking for ninja ... D:\Program Files\Microsoft Visual Studio\2022\Community\Common7\IDE\CommonExtensions\Microsoft\CMake\Ninja\ninja
checking for link.exe ... D:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.41.34120\bin\HostX64\x64\link.exe
checking for the linker (ld) ... link.exe
checking for nvfortran ... no
checking for the fortran linker (fcld: nvfortran) ... no
checking for gfortran ... no
checking for g95 ... no
checking for the fortran linker (fcld: gfortran) ... no
checking for xmake-repo::openmp ... openmp
checking for xmake::cuda ... no
checking for xmake-repo::cuda ... cuda
checking for xmake::nvtx ... no
checking for xmake-repo::nvtx ... nvtx
checking for xmake::tbb ... tbb 2021.12.0
checking for xmake::mkl ... no
> checking for c links(mkl)
> checking for c snippet(find_package/mkl)
checking for mkl ... no
checking for xmake::libtorch ... no
> checking for c links(libtorch)
> checking for c snippet(find_package/libtorch)
checking for libtorch ... no
checking for xmake::cuda ... no
checking for xmake-repo::cuda ... cuda
note: install or modify (m) these packages (pass -y to skip confirm)?
  -> mkl 2024.2.0+661 [runtimes:"MD", from:libtorch]
  -> libtorch v2.4.1 [runtimes:"MD", shared:y, blas:"mkl", cuda:y]
please input: y (y/n/m)

checking for ping ... ok
pinging the host(github.com) ... 1 ms
pinging the host(anaconda.org) ... 65535 ms
checking for curl ... D:\Scoop\ScoopApps\apps\xmake\current\winenv\bin\curl
D:\Scoop\ScoopApps\apps\xmake\current\winenv\bin\curl -SL -A "Xmake/2.9.5+HEAD.d30de52e9 (Windows;) curl/8.2.1" -k https://anaconda.org/intel/mkl-static/2024.2.0/download/win-64/mkl-static-2024.2.0-intel_661.tar.bz2 -o mkl-static-2024.2.0-intel_661.tar.bz2
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   167  100   167    0     0     91      0  0:00:01  0:00:01 --:--:--    91
100  7365  100  7365    0     0   3006      0  0:00:02  0:00:02 --:--:--     0
error: unmatched checksum, current hash(48c71fa0) != original hash(e760103a)
  => download https://anaconda.org/intel/mkl-static/2024.2.0/download/win-64/mkl-static-2024.2.0-intel_661.tar.bz2 .. failed

we can also download these packages manually:
  - https://anaconda.org/intel/mkl-static/2024.2.0/download/win-64/mkl-static-2024.2.0-intel_661.tar.bz2
to the local search directories: D:/3rdLibs/xmake/downloadeds
  - mkl-static-2024.2.0-intel_661.tar.bz2, mkl-2024.2.0+661.tar.bz2
and we can run `xmake g --pkg_searchdirs=/xxx` to set the search directories.
error:

Issues-translate-bot commented 1 month ago

Bot detected the issue body's language is not English, translate it automatically.

Title: libtorch cannot use gpu & mkl hash value mismatch

waruqi commented 1 month ago

sha 问题，可以直接来个 pr 过来修复下

Issues-translate-bot commented 1 month ago

Bot detected the issue body's language is not English, translate it automatically.

For sha problems, you can directly send a PR to fix it.

jalaxy33 commented 1 month ago

sha 问题，可以直接来个 pr 过来修复下

我刚开始学 xmake，请问包的 sha 要在哪里获得呢？@waruqi

Issues-translate-bot commented 1 month ago

Bot detected the issue body's language is not English, translate it automatically.

For sha problems, you can directly send a PR to fix it.

I just started learning xmake. Where can I get the sha package? @waruqi

waruqi commented 1 month ago

sha 问题，可以直接来个 pr 过来修复下

我刚开始学 xmake，请问包的 sha 要在哪里获得呢？@waruqi

下载包的 tar 源码包后，使用 shasum -a 256 filepath 获取，或者用 xmake l hash.sha256 filepath

Issues-translate-bot commented 1 month ago

Bot detected the issue body's language is not English, translate it automatically.

sha problem, you can directly send a PR to fix it

I just started learning xmake. Where can I get the sha package? @waruqi

After downloading the tar source package of the package, use shasum -a 256 filepath to obtain it, or use xmake l hash.sha256 filepath

xq114 commented 1 month ago

https://github.com/xmake-io/xmake-repo/pull/5602

xmake-io / xmake-repo