huggingface / candle

Minimalist ML framework for Rust
Apache License 2.0
15.83k stars 957 forks source link

Cannot run examples with --features cuda option #353

Open dbrowne opened 1 year ago

dbrowne commented 1 year ago

CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true warning: some crates are on edition 2021 which defaults to resolver = "2", but virtual workspaces default to resolver = "1" note: to keep the current resolver, specify workspace.resolver = "1" in the workspace root's manifest note: to use the edition 2021 resolver, specify workspace.resolver = "2" in the workspace root's manifest Compiling libc v0.2.147 Compiling autocfg v1.1.0 Compiling crossbeam-utils v0.8.16 Compiling proc-macro2 v1.0.66 Compiling unicode-ident v1.0.11 Compiling rayon-core v1.11.0 Compiling memchr v2.5.0 Compiling libm v0.2.7 Compiling cfg-if v1.0.0 Compiling pkg-config v0.3.27 Compiling paste v1.0.14 Compiling serde v1.0.183 Compiling serde_derive v1.0.183 Compiling scopeguard v1.2.0 Compiling syn v1.0.109 Compiling serde_json v1.0.104 Compiling seq-macro v0.3.5 Compiling vcpkg v0.2.15 Compiling crc32fast v1.3.2 Compiling ident_case v1.0.1 Compiling strsim v0.10.0 Compiling fnv v1.0.7 Compiling thiserror v1.0.44 Compiling either v1.9.0 Compiling glob v0.3.1 Compiling openssl v0.10.56 Compiling rustls v0.21.6 Compiling anyhow v1.0.72 Compiling cudarc v0.9.13 Compiling portable-atomic v1.4.2 Compiling native-tls v0.2.11 Compiling esaxx-rs v0.1.8 Compiling adler v1.0.2 Compiling rustix v0.38.7 Compiling gimli v0.27.3 Compiling macro_rules_attribute-proc_macro v0.1.3 Compiling rustc-demangle v0.1.23 Compiling miniz_oxide v0.7.1 Compiling heck v0.4.1 Compiling flate2 v1.0.26 Compiling memoffset v0.9.0 Compiling crossbeam-epoch v0.9.15 Compiling num-traits v0.2.16 Compiling zip v0.6.6 Compiling crossbeam-channel v0.5.8 Compiling aho-corasick v1.0.2 Compiling object v0.31.1 Compiling nom v7.1.3 Compiling aho-corasick v0.7.20 Compiling quote v1.0.32 Compiling macro_rules_attribute v0.1.3 Compiling syn v2.0.28 Compiling crossbeam-deque v0.8.3 Compiling num_cpus v1.16.0 Compiling getrandom v0.2.10 Compiling dirs-sys v0.4.1 Compiling console v0.15.7 Compiling memmap2 v0.7.1 Compiling regex-automata v0.3.6 Compiling cc v1.0.82 Compiling dirs v5.0.1 Compiling rand_core v0.6.4 Compiling num-complex v0.4.3 Compiling rand_chacha v0.3.1 Compiling indicatif v0.17.6 Compiling rand v0.8.5 Compiling addr2line v0.20.0 Compiling rayon v1.7.0 Compiling is-terminal v0.4.9 Compiling ring v0.16.20 Compiling openssl-sys v0.9.91 Compiling rand_distr v0.4.3 Compiling backtrace v0.3.68 Compiling onig_sys v69.8.1 Compiling anstream v0.3.2 Compiling clap_builder v4.3.21 Compiling half v2.3.1 Compiling spm_precompiled v0.1.4 Compiling regex v1.9.3 Compiling darling_core v0.14.4 Compiling fancy-regex v0.10.0 Compiling candle-kernels v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-kernels) Compiling candle-gemm-common v0.15.5 Compiling rayon-cond v0.1.0 Compiling candle-gemm-f32 v0.15.5 Compiling candle-gemm-f64 v0.15.5 Compiling candle-gemm-c64 v0.15.5 Compiling candle-gemm-c32 v0.15.5 Compiling safetensors v0.3.2 Compiling candle-examples v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-examples) Compiling tracing-chrome v0.7.1 Compiling candle-gemm-f16 v0.15.5 error: failed to run custom build command for candle-kernels v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-kernels)

Caused by: process didn't exit successfully: /mnt/source1/djbGR/ruststuffs/candle/target/release/build/candle-kernels-e21ab5b8e8daaf0a/build-script-build (exit status: 101) --- stdout cargo:rerun-if-changed=build.rs cargo:rustc-env=CUDA_INCLUDE_DIR=/usr/local/cuda/include cargo:rerun-if-changed=src/ cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP cargo:rustc-env=CUDA_COMPUTE_CAP=sm_61

--- stderr src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

2 errors detected in the compilation of "src/indexing.cu". src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

2 errors detected in the compilation of "src/affine.cu". src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

2 errors detected in the compilation of "src/cast.cu". 2 errors detected in the compilation of "src/reduce.cu". 2 errors detected in the compilation of "src/conv.cu". src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^

src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^

2 errors detected in the compilation of "src/ternary.cu". 2 errors detected in the compilation of "src/unary.cu". 2 errors detected in the compilation of "src/binary.cu". thread 'main' panicked at 'nvcc error while compiling "src/affine.cu":

stdout

stderr

', candle-kernels/build.rs:207:13 stack backtrace: 0: 0x557f8498d0b1 - std::backtrace_rs::backtrace::libunwind::trace::hb01a67340c9cfb71 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5 1: 0x557f8498d0b1 - std::backtrace_rs::backtrace::trace_unsynchronized::h896aca561948c930 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x557f8498d0b1 - std::sys_common::backtrace::_print_fmt::h8627be5b68fbde29 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:65:5 3: 0x557f8498d0b1 - ::fmt::h1b7758da45f4cd22 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:44:22 4: 0x557f849b282c - core::fmt::rt::Argument::fmt::h0eb38586043a01ca at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/fmt/rt.rs:138:9 5: 0x557f849b282c - core::fmt::write::h68b52f8aa598961e at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/fmt/mod.rs:1094:21 6: 0x557f8498949e - std::io::Write::write_fmt::hc5568929b662da92 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/io/mod.rs:1714:15 7: 0x557f8498cec5 - std::sys_common::backtrace::_print::h65aecbff12ca83c8 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:47:5 8: 0x557f8498cec5 - std::sys_common::backtrace::print::hf75ac9d60598d247 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:34:9 9: 0x557f8498e483 - std::panicking::default_hook::{{closure}}::hc2cb8da3be7476b0 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:269:22 10: 0x557f8498e19d - std::panicking::default_hook::hefa49c86da66275b at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:288:9 11: 0x557f8498ea09 - std::panicking::rust_panic_with_hook::hd4c3b0056ba96951 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:705:13 12: 0x557f8498e907 - std::panicking::begin_panic_handler::{{closure}}::he487675683e9a525 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:597:13 13: 0x557f8498d516 - std::sys_common::backtrace::rust_end_short_backtrace::hcff58b9b81620321 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:151:18 14: 0x557f8498e652 - rust_begin_unwind at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:593:5 15: 0x557f848b9333 - core::panicking::panic_fmt::h1b81548733a03bd5 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/panicking.rs:67:14 16: 0x557f848c3323 - build_script_build::cuda::build_ptx::ha488acce3cd701b3 at /mnt/source1/djbGR/ruststuffs/candle/candle-kernels/build.rs:207:13 17: 0x557f848c0878 - build_script_build::main::h2523e6c20b65fa04 at /mnt/source1/djbGR/ruststuffs/candle/candle-kernels/build.rs:6:33 18: 0x557f848d40cb - core::ops::function::FnOnce::call_once::h385ddf31127d3e12 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/ops/function.rs:250:5 19: 0x557f848ccbae - std::sys_common::backtrace::rust_begin_short_backtrace::h1cfd550c72c3e194 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:135:18 20: 0x557f848e0130 - std::rt::lang_start::{{closure}}::h70dc5fa7783a03f7 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:166:18 21: 0x557f8498541b - core::ops::function::impls::<impl core::ops::function::FnOnce for &F>::call_once::h9eccf02cf11756f6 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/ops/function.rs:284:13 22: 0x557f8498541b - std::panicking::try::do_call::hc95b838862bbb45a at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:500:40 23: 0x557f8498541b - std::panicking::try::h82935254d12a76fc at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:464:19 24: 0x557f8498541b - std::panic::catch_unwind::h7fd9d11cd70fc350 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panic.rs:142:14 25: 0x557f8498541b - std::rt::lang_start_internal::{{closure}}::h0ddb191e68b650a4 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:148:48 26: 0x557f8498541b - std::panicking::try::do_call::h17d4693c7a6e120c at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:500:40 27: 0x557f8498541b - std::panicking::try::h684fc020e1305912 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:464:19 28: 0x557f8498541b - std::panic::catch_unwind::h757da538db515116 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panic.rs:142:14 29: 0x557f8498541b - std::rt::lang_start_internal::ha6b1625a1e9a4f5b at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:148:20 30: 0x557f848e010a - std::rt::lang_start::h0d1360f20fc735dd at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:165:17 31: 0x557f848c43fe - main 32: 0x7fd8be429d90 - libc_start_call_main at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 33: 0x7fd8be429e40 - libc_start_main_impl at ./csu/../csu/libc-start.c:392:3 34: 0x557f848b9a15 - _start 35: 0x0 -

golddranks commented 3 months ago

I'm encountering a similar CUDA compilation error as mentioned on this thread, as as requested in https://github.com/huggingface/candle/issues/353#issuecomment-1680374705 , I'm reporting it. (Seeking for help / explanation.)

I'm trying to run Meta-Llama-3.1-8B-Instruct on CUDA. On compiling candle-kernels, it panics with a CUDA compilation error. The erroring compilation seems to be this:

> nvcc "--gpu-architecture=sm_75" "--ptx" "--default-stream" "per-thread" "--output-directory" "C:\\Users\\kon\\repos\\llama_test\\target\\release\\build\\candle-kernels-cce780b137744591\\out" "-Isrc" "-IC:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.6\\include" "src\\affine.cu"
affine.cu
C:\Users\kon\repos\candle\candle-kernels\src\compatibility.cuh(11): error: identifier "__hmax" is undefined

C:\Users\kon\repos\candle\candle-kernels\src\compatibility.cuh(14): error: identifier "__hmin" is undefined

2 errors detected in the compilation of "src//affine.cu".

Diagnostics:

> nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce GTX 1650, 7.5, 560.76

> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:36:24_Pacific_Standard_Time_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

Setting "--gpu-architecture=sm_80 it compiles without problems, but my GPU doesn't support compute capability 8.0. Why doesn't it work for 7.5 and is there something I can do (besides getting a newer GPU?)?

Btw. why I am running CUDA 11.6 is because candle has a dependency cudarc pinned on version 0.11.6. and it refused to run the latest version. Peeking the release log, they released support for CUDA 11.6 in v0.11.5, so I went for that. Scrolling back, I now realised that they have been adding supported versions in a very strange order, and apparently some CUDA 12.x versions ARE supported! Disorienting! My next step is trying those!

golddranks commented 3 months ago

ARGH, it works on CUDA 12.4! So, anyone who got an error message of CUDA 12.6 not being supported, you shouldn't downgrade as much as I did.

LaurentMazare commented 3 months ago

Right, we use cudarc to interface with cuda and at the moment they don't support cuda 12.6, see this issue https://github.com/coreylowman/cudarc/issues/280 .

MolotovCherry commented 2 months ago

Running into this issue on my computer. I've tried toolkit 12.6-12.4, all with the same results.

~
❯ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 3080 Ti Laptop GPU, 8.6, 560.81

~
❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0