microsoft / vcpkg

C++ Library Manager for Windows, Linux, and MacOS
MIT License
23.37k stars 6.46k forks source link

[libtorch] Build error on x64-windows #41907

Open sjpritchard opened 3 weeks ago

sjpritchard commented 3 weeks ago

Package: libtorch:x64-windows@2.1.2#7

Host Environment

To Reproduce

vcpkg install

Failure logs

-- Using cached libtorch-cudnn-9-fix-e14026bc2a6cd80bedffead77a5d7b75a37f8e67.patch.
-- Using cached libtorch-cuda-thrust-missing-header-2a440348958b3f0a2b09458bd76fe5959b371c0c.patch.
-- Using cached pytorch-pytorch-v2.1.2.tar.gz.
-- Cleaning sources at C:/vcpkg/buildtrees/libtorch/src/v2.1.2-89de820eea.clean. Use --editable to skip cleaning for the packages you specify.
-- Extracting source C:/vcpkg/downloads/pytorch-pytorch-v2.1.2.tar.gz
-- Applying patch C:/vcpkg/downloads/libtorch-cudnn-9-fix-e14026bc2a6cd80bedffead77a5d7b75a37f8e67.patch
-- Applying patch C:/vcpkg/downloads/libtorch-cuda-thrust-missing-header-2a440348958b3f0a2b09458bd76fe5959b371c0c.patch
-- Applying patch cmake-fixes.patch
-- Applying patch more-fixes.patch
-- Applying patch fix-build.patch
-- Applying patch clang-cl.patch
-- Applying patch cuda-adjustments.patch
-- Applying patch fix-api-export.patch
-- Applying patch fxdiv.patch
-- Applying patch protoc.patch
-- Applying patch fix-sleef.patch
-- Applying patch fix-glog.patch
-- Applying patch fix-msvc-ICE.patch
-- Applying patch fix-calculate-minloglevel.patch
-- Applying patch force-cuda-include.patch
-- Applying patch fix-aten-cutlass.patch
-- Applying patch fix-build-error-with-fmt11.patch
-- Using source at C:/vcpkg/buildtrees/libtorch/src/v2.1.2-89de820eea.clean
-- Using cached pytorch-kineto-49e854d805d916b2031e337763928d2f8d2e1fbf.tar.gz.
-- Cleaning sources at C:/vcpkg/buildtrees/libtorch/src/2f8d2e1fbf-de7a6d4b27.clean. Use --editable to skip cleaning for the packages you specify.
-- Extracting source C:/vcpkg/downloads/pytorch-kineto-49e854d805d916b2031e337763928d2f8d2e1fbf.tar.gz
-- Applying patch kineto.patch
-- Using source at C:/vcpkg/buildtrees/libtorch/src/2f8d2e1fbf-de7a6d4b27.clean
-- Using cached NVIDIA-cudnn-frontend-12f35fa2be5994c1106367cac2fba21457b064f4.tar.gz.
-- Cleaning sources at C:/vcpkg/buildtrees/libtorch/src/1457b064f4-11f2c20faf.clean. Use --editable to skip cleaning for the packages you specify.
-- Extracting source C:/vcpkg/downloads/NVIDIA-cudnn-frontend-12f35fa2be5994c1106367cac2fba21457b064f4.tar.gz
-- Using source at C:/vcpkg/buildtrees/libtorch/src/1457b064f4-11f2c20faf.clean
-- Using flatc: C:/Users/steve/simcortex/ai_sim_assistant/vcpkg_installed/x64-windows/tools/flatbuffers/flatc.exe
-- Using protoc: C:/Users/steve/simcortex/ai_sim_assistant/vcpkg_installed/x64-windows/tools/protobuf/protoc.exe
-- Using cached pypa-get-pip-24.2.tar.gz.
-- Cleaning sources at C:/vcpkg/buildtrees/libtorch/src/24.2-36e81c02c7.clean. Use --editable to skip cleaning for the packages you specify.
-- Extracting source C:/vcpkg/downloads/pypa-get-pip-24.2.tar.gz
-- Using source at C:/vcpkg/buildtrees/libtorch/src/24.2-36e81c02c7.clean
CMake Error at scripts/cmake/vcpkg_execute_required_process.cmake:127 (message):
    Command failed: C:/vcpkg/downloads/tools/python/python-3.12.7-x64/python.exe C:/vcpkg/buildtrees/libtorch/src/24.2-36e81c02c7.clean/public/get-pip.py --no-warn-script-location
    Working Directory: C:/vcpkg/buildtrees/libtorch
    Error code: Access violation
    See logs for more information:

Call Stack (most recent call first):
  C:/Users/steve/simcortex/ai_sim_assistant/vcpkg_installed/x64-windows/share/vcpkg-get-python-packages/x_vcpkg_get_python_packages.cmake:44 (vcpkg_execute_required_process)
  buildtrees/versioning_/versions/libtorch/e738e2da09a82affc4bf4090d8c672579364231b/portfile.cmake:86 (x_vcpkg_get_python_packages)
  scripts/ports.cmake:192 (include)

Additional context

vcpkg.json ``` { "name": "ai-sim-assistant", "builtin-baseline": "4f746bc66438fce2b900c3ba6094a483b871b045", "dependencies": [ { "name": "flashlight-text", "version>=": "0.0.4", "features": [ "kenlm" ] }, { "name": "glog", "version>=": "0.7.1" }, { "name": "gtest", "version>=": "1.15.2" }, { "name": "jwt-cpp", "version>=": "0.7.0#1" }, { "name": "kenlm", "version>=": "20230531#1" }, { "name": "libtorch", "default-features": false, "version>=": "2.1.2#7" }, { "name": "openssl", "version>=": "3.4.0" }, { "name": "rapidfuzz", "version>=": "3.0.5" } ] } ```
MonicaLiu0311 commented 3 weeks ago
    Error code: Access violation

Please confirm your access rights and antivirus software, then clean the cache and rerun:

./vcpkg remove libtorch
./vcpkg install libtorch
petersteneteg commented 2 weeks ago

I'm having the exact same error code but when building glad which is also using python 3.12.7

petersteneteg commented 2 weeks ago

If I run the same python script with my local python installation 3.12.3 there is no issues

petersteneteg commented 2 weeks ago

My issue goes away if I revert back to python 3.11.8 by reverting this: https://github.com/microsoft/vcpkg/commit/7dcb4ca95bfde3cfca3a9f0a7e933f1d17cc4836#diff-26499ad5d840cada3059f7aaa8363695140b4a6811ed08697b388ab19e16ae97L3-R35

in PR https://github.com/microsoft/vcpkg/pull/41232

This could still be related to antivirus/applocker/etc issues but debugging the Access Violation crash is hard. Seems to happen somewhere in pythons importlib._bootstrap for me. Maybe the new old python has be removed from windows defender, but the new one triggers some issue? Very hard to say. Im not allowed to bypass the org antivirus settings so Im not able to try that.

petersteneteg commented 2 weeks ago

Possible the same issue as #41707

petersteneteg commented 2 weeks ago

I figured out the issue.

vcpkg downloads and uses python 3.12.7 and puts that into vcpkg/downloads/tools/python/python-3.12.7-x64

But I have a local python installation of python 3.12.3 in C:\Program Files\Python312 which is also in the global PATH

And it turns out the the vcpkg python will load the _ctypes.pyd from C:\Program Files\Python312 \DLLs not the on in vcpkg/downloads/tools/python/python-3.12.7-x64 and the are apparently not binary compatible. The old ctypes dll will try use PyObject_LookupAttr which craches with an access violation.

Updating my local python version to the same version "solves" the issue.
But the vcpkg python really should run isolated from any other installations.

petersteneteg commented 2 weeks ago

Maybe scripts/cmake/vcpkg_find_acquire_program(PYTHON3).cmake should prefix it self to the local path

MonicaLiu0311 commented 2 weeks ago

@Neumann-A Could you help look into this issue?

dg0yt commented 2 weeks ago

IMO this is the same problem that we see with the meson ports (other issues): One of the other installations of python interferes with the version run by vcpkg.
And there is a PR for meson in particular. One thing I don't like about that PR is that I assume that vcpkg_find_acquire_program(PYTHON3) must be fixed to do the right thing.

vcpkg_find_acquire_program(PYTHON3) disables the "isolated mode" for the downloaded python. This has some desired effects, but it also makes python use Windows registry and environment variables (PYTHONPATH and PYTHONHOME) when setting up the module search path. I guess this is what causes interference but I'm not an expert for python.

@petersteneteg Are these environment variables set in your system? If yes, is the problem solved by unsetting them?

Neumann-A commented 2 weeks ago

The registry lookup is the problem. I think the only way to 'solve' it is by pretending to be a venv python if vcpkg provides the tool.

dg0yt commented 2 weeks ago

The registry lookup is the problem. I think the only way to 'solve' it is by pretending to be a venv python if vcpkg provides the tool.

Agree. Basically we have what is needed in vcpkg-get-python-packages, except that it requires packages to be defined. (pip should be a no-op?)