Open dbrowne opened 1 year ago
What OS and CUDA version are you using? I seem to be having a similar issue (with loads of C/CU/C++ errors) with both the crates.io and the github versions on Windows.
This is my cargo.toml:
[package]
name = "candle_test"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
# candle-core = {git = "https://github.com/huggingface/candle.git", branch = "main", features = ["cuda"]}
candle-core = {features = ["cuda"], version = "0.1.0"}
# candle-nn = {git = "https://github.com/huggingface/candle.git", branch = "main"}
Rust version is 1.71.0. I'm running CUDA 11.7, which could be the problem, but I can't see anything about what version I should use.
'Linux version 6.2.0-26-generic (buildd@bos03-amd64-042) (x86_64-linux-gnu-gcc-11 (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2
rustc 1.73.0-nightly (39f42ad9e 2023-07-19)
Cuda compilation tools, release 12.2, V12.2.128
Build cuda_12.2.r12.2/compiler.33053471_0'
Also seeing thread 'main' panicked at 'nvcc error while compiling "src/affine.cu":
on arch.
$ rustc --version
rustc 1.71.0 (8ede3aae2 2023-07-12)
$ pacman -Qi cuda
Name : cuda
Version : 12.2.0-1
Description : NVIDIA's GPU programming toolkit
Architecture : x86_64
URL : https://developer.nvidia.com/cuda-zone
Licenses : custom:NVIDIA
Groups : None
Provides : cuda-toolkit cuda-sdk libcudart.so=12-64 libcublas.so=12-64 libcublas.so=12-64
libcusolver.so=11-64 libcusolver.so=11-64 libcusparse.so=12-64 libcusparse.so=12-64
Depends On : opencl-nvidia nvidia-utils python gcc12
Optional Deps : gdb: for cuda-gdb [installed]
glu: required for some profiling tools in CUPTI [installed]
Required By : cudnn magma-cuda python-pycuda python-pytorch-cuda python-tensorflow-opt-cuda
tensorflow-opt-cuda
Optional For : meshroom-bin openmpi
Conflicts With : None
Replaces : cuda-toolkit cuda-sdk cuda-static
Installed Size : 4.36 GiB
Packager : Sven-Hendrik Haase <svenstaro@archlinux.org>
Build Date : Sun 02 Jul 2023 01:59:36 PM MDT
Install Date : Sun 16 Jul 2023 07:35:25 AM MDT
Install Reason : Installed as a dependency for another package
Install Script : Yes
Validated By : Signature
What cards are you guys having ?
We need compute_cap>7.0
for it to work. I know compute_cap 5.2 does trigger similar fails.
The core kernels we have use f16 and bf16 and those old cards cannot compile them properly. I added some flags for some options but I didn't check all potential caps yet.
I could get my own project to compile by specifying the version on the command line. To see what your system supports use nvcc --list-gpu-code
CUDA_COMPUTE_CAP=90 cargo build --release --features cuda
However, during runtime I get DriverError(CUDA_ERROR_NOT_SUPPORTED, "operation not supported")
. So GPU acceleration does not work for me. It looks like that's something else though, as this also doesn't work for me: https://askubuntu.com/a/1215237. That's not related to candle at all, so no need to fix that in this thread.
Ok, did some more investigation. It does turn out that my device only supports up to 50. So I can get pure CUDA code to run if I compile with nvcc -arch=sm_50
. I can also get my application that uses candle to compile with that, but it still gives me the driver error, so I guess it's not supported.
My GPU (Quadro M620) runs torch fine, so would be great if candle could add support for it! Mainly because it's a nice laptop to develop on.
@krolinventions Perfectly understand. My own GTX 970 is too old to run candle
atm.
However, in order to deliver fast we had to cut corners in that department. Currently I think I would like to focus on giving a good error message before actually writing kernels that work on old hardware. I may do it on my spare time to be able to use my old GPU, but I know the time it takes.
If you want to take a stab at it, you're more than welcome !
@Narsil Actually, I think just using the CPU for development is actually fine. It's great to not have to deal with installing all that extra stuff, like with torch, or the cuda libraries.
On looking at the kernels: I have never used CUDA before but from the few examples that I've seen it looks rather nice. I think I may need a little more experience with it before tackling these, but maybe!
Do try, it's not as daunting as it looks (it's daunting when you want the best possible performance). Feel free to join the Discord HF on channel candle to pursue the discussion.
I'lll keep the issue open to give better error message.
We need compute_cap>7.0 for it to work.
I'm using a 1080TI
$ nvidia-smi --query-gpu=compute_cap --format=csv,noheader
6.1
😢
What cards are you guys having ?
Just tried compiling with RTX 3060 (compute = 8.6) and CUDA 12.2 on Windows and still getting a massive error log of C errors.
e.g.:
error: asm operand type size(8) does not match type/size implied by constraint 'r'
static __declspec(__device__) __inline longlong2 __ldg(const longlong2 *ptr) { longlong2 ret; asm volatile ("ld.global.nc.v2.s64 {%0,%1}, [%2];" : "=l"(ret.x), "=l"(ret.y) : "r" (ptr)); return ret; }
along with other errors that are spammed a lot of times.
Yes Windows seems to be having issues. I've been told in discord WSL is ok.
What cards are you guys having ?
We need
compute_cap>7.0
for it to work. I know compute_cap 5.2 does trigger similar fails.The core kernels we have use f16 and bf16 and those old cards cannot compile them properly. I added some flags for some options but I didn't check all potential caps yet.
My compute cap is 6.1 It is a stretch but I'm willing to work on this if you point me in the right direction.
@dbrowne
Go to candle/candle-kernels/src/
And try to make the .cu
compile:
nvcc --ptx --gpu-architecture=sm_61 affine.cu -I.
Most of the logic should be in compatibility.cuh
. 61 should be easier than 52 and earlier.
Is seems I have a fix for 61
Can you take my PR out for a spin ?
https://github.com/huggingface/candle/pull/386
It fixes compilation but it still doesn't work on my 52 because the ops are still not there. However once you have the PTX you can test out of candle and debug by more classical means to try and understand why it compiles but fails to run.
Looks like that PR has been merged!
On current master, my 1080TI now works like a charm, thank you!
On my threadripper takes 13-15s for the example:
$ time cargo run --example whisper --release
Finished release [optimized] target(s) in 0.28s
Running `target/release/examples/whisper`
Running on CPU, to run on GPU, build this example with `--features cuda`
No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
pcm data loaded 176000
loaded mel: [1, 80, 3000]
audio features: [1, 1500, 384]
3000: Segment { start: 0.0, duration: 30.0, dr: DecodingResult { tokens: [50257, 50363, 843, 523, 616, 5891, 3399, 1265, 407, 644, 534, 1499, 460, 466, 329, 345, 1265, 644, 345, 460, 466, 329, 534, 1499, 13, 50903, 50256], text: " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.", avg_logprob: -0.3303277552190798, no_speech_prob: 0.017772182822227478, temperature: 0.0, compression_ratio: NaN } }, in 15.577960389s
real 0m17.062s
user 0m12.536s
sys 0m3.221s
Enabling the cuda feature takes it well below a second (~2 seconds total runtime). Wow!
$ time cargo run --example whisper --release --features cuda
Finished release [optimized] target(s) in 0.31s
Running `target/release/examples/whisper`
No audio file submitted: Downloading https://huggingface.co/datasets/Narsil/candle_demo/blob/main/samples_jfk.wav
loaded wav data: Header { audio_format: 1, channel_count: 1, sampling_rate: 16000, bytes_per_second: 32000, bytes_per_sample: 2, bits_per_sample: 16 }
pcm data loaded 176000
loaded mel: [1, 80, 3000]
audio features: [1, 1500, 384]
3000: Segment { start: 0.0, duration: 30.0, dr: DecodingResult { tokens: [50257, 50363, 843, 523, 616, 5891, 3399, 1265, 407, 644, 534, 1499, 460, 466, 329, 345, 1265, 644, 345, 460, 466, 329, 534, 1499, 13, 50903, 50256], text: " And so my fellow Americans ask not what your country can do for you ask what you can do for your country.", avg_logprob: -0.3305633301574319, no_speech_prob: 0.017772099003195763, temperature: 0.0, compression_ratio: NaN } }, in 300.960168ms
real 0m2.166s
user 0m1.582s
sys 0m0.433s
CUDA-backed NNs in Rust!? This is really exciting :)
@n8henrie This is far from optimized yet ;).
We ran a few passes, but there's still a lot more that can be done
@dbrowne
Go to
candle/candle-kernels/src/
And try to make the
.cu
compile:nvcc --ptx --gpu-architecture=sm_61 affine.cu -I.
Most of the logic should be in
compatibility.cuh
. 61 should be easier than 52 and earlier.Is seems I have a fix for 61
I'm awaiting delivery of a RTX a4500. If it does not work in my workstation I will begin in earnest to pursue this.
Does it work now on main ? I made fixes for older cards (still far from universal support but should be much better)
Does it work now on main ? I made fixes for older cards (still far from universal support but should be much better)
Yes
@Narsil Also works for me (Quadro M620). Both the examples and my own code. Thanks!
If people in the future have similar CUDA compilation errors (functions already being defined, etc.), it's because the compatilibty header compatibility.cuh
is not perfect. Not all graphics cards and all driver versions have the same functions, which is why compatibility.cuh
attempts to emulate them. Unfortunately, which gpus and which drivers introduce each function is not well documented by NVIDIA, so several guesses were made regarding when to emulate things.
If you get such an error, please report it here (or as an issue on candle or dfdx). You can use these commands to give helpful context:
> nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA T500, 7.5, 536.25
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:42:34_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
For more information:
Getting a similar error using a 2080 ti with cuda version 12.2, driver 535.86.05 on POP OS 22.04.
compatibility.cuh(11): error: identifier "__hmax" is undefined
compatibility.cuh(11): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(11): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(14): error: identifier "__hmin" is undefined
compatibility.cuh(14): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(14): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
Might be a result of the cuda toolkit being an older version. Pop OS does not have a newer driver compatible with the latest cuda toolkit yet. Hopefully driver differences won't matter much in the future.
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Can't compile with cuda feature. Does the following info help (anything a clueless cuda newbie could do)?
$ nvcc --ptx --gpu-architecture=sm_61 affine.cu -I.
compatibility.cuh(11): error: identifier "__hmax" is undefined
compatibility.cuh(11): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(11): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(14): error: identifier "__hmin" is undefined
compatibility.cuh(14): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(14): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
6 errors detected in the compilation of "affine.cu".
$ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce GTX 1080, 6.1, 535.86.05
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
$ uname -ar
Linux visi2 5.19.0-41-generic #42~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Apr 18 17:40:00 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
@bayedieng @theHausdorffMetric
compatibility.cuh(11): error: identifier "__hmax" is undefined
Yes this means cuda 11.5 doesn't have this function, therefore the compat layer doesn't work.
Upgrading cuda should help, at least 11.8.
OK, not sure if this is an edge case.
I was trying out candle on nvidia jetson nano.
candle failed to detect cuda because it uses nvidia-smi
while jetson uses tegrastats
still doesn't work on main
(4abc1ea34dbc834e561f442737faf2c735f0a6ce), here are yet more error messages
$ nvcc --ptx --gpu-architecture=sm_61 affine.cu -I.
compatibility.cuh(11): error: identifier "__hmax" is undefined
compatibility.cuh(11): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(11): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(14): error: identifier "__hmin" is undefined
compatibility.cuh(14): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
compatibility.cuh(14): error: ambiguous "?" operation: second operand of type "__half" can be converted to third operand type "<error-type>", and vice versa
6 errors detected in the compilation of "affine.cu".
$ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 2060, 7.5, 535.104.05
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
$ uname -a
Linux chorusfruit 6.2.6-76060206-generic #202303130630~1689015125~22.04~ab2190e SMP PREEMPT_DYNAMIC Mon J x86_64 x86_64 x86_64 GNU/Linux
try with Cuda >=12
I also have an issue getting candle to utilize the GPU.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
$ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 2070, 7.5, 535.104.12
$ uname -a
Linux xxx 5.4.0-164-generic #181-Ubuntu SMP Fri Sep 1 13:41:22 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
I get this error when I run the Mistral example:
$ cargo run --example mistral --features cuda --release -- --prompt "Here is a sample quick sort implementation in rust " --quantized -n 400
avx: true, neon: false, simd128: false, f16c: true
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 128.695µs
loaded the model in 2.726856172s
Here is a sample quick sort implementation in rust Illegal instruction (core dumped)
When I edit ~/candle/.cargo/config.toml
to build with the flags rustflags = ["-C", "target-cpu=native", "-C", "target-feature=-avx,-avx2"]
(basically I copy and insert the build flags for [target.x86_64-apple-darwin]
, inspired by your suggestion in Issue #622) it runs fine on the CPU.
I doubt that it would be cuda related as the quantized models are supposed to be always on the cpu at the moment. Could you try to run with the exact same setup than when it crashes but remove the --features cuda
? And keep the features flag and add --cpu
.
Finaly if you could launch this in a gdb and send back the backtrace that could be very useful (and in this case better to compile with the debug symbols, i.e. replace --release
with --profile=release-with-debug
).
Thanks
Aaah okay it's not supposed to run on the GPU anyway!
I'm having some issues with my machine and just gonna take the rest of the day off, but I will test your suggestions in the next few days and report back. I have no experience with debuggers, but I'm sure it's straightforward :)
Here are the backtraces as chewed out by rust-gdb
:
When built with the "vanilla" command from the example:
cargo run --example mistral --features cuda --profile=release-with-debug -- --prompt "Here is a sample quick sort implementation in rust " --quantized -n 400
When built with the same command but without the --features cuda
flag:
cargo run --example mistral --profile=release-with-debug -- --prompt "Here is a sample quick sort implementation in rust " --quantized -n 400
When built with the same ("vanilla") command but adding the --cpu
flag:
cargo run --example mistral --features cuda --profile=release-with-debug -- --prompt "Here is a sample quick sort implementation in rust " --quantized -n 400 --cpu
If I forgot / messed up anything or you'd like to see another one, do let me know and I'll fire it up again!
Back to the CUDA-related issue, when I run the "vanilla" command without the --quantized
flag, I get the following error:
$ RUST_BACKTRACE=1 cargo run --example mistral --features cuda --profile=release-with-debug -- --prompt "Here is a sample quick sort implementation in rust " -n 400
Finished release-with-debug [optimized + debuginfo] target(s) in 0.20s
Running `target/release-with-debug/examples/mistral --prompt 'Here is a sample quick sort implementation in rust ' -n 400`
avx: true, neon: false, simd128: false, f16c: true
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 145.215µs
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16
Stack backtrace:
0: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1962:27
mistral::main
at ./candle-examples/examples/mistral/main.rs:253:21
1: core::ops::function::FnOnce::call_once
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/function.rs:250:5
std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:154:18
2: std::rt::lang_start::{{closure}}
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:166:18
3: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/function.rs:284:13
std::panicking::try::do_call
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40
std::panicking::try
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19
std::panic::catch_unwind
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14
std::rt::lang_start_internal::{{closure}}
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:148:48
std::panicking::try::do_call
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40
std::panicking::try
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19
std::panic::catch_unwind
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14
std::rt::lang_start_internal
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/rt.rs:148:20
4: main
5: __libc_start_main
at /build/glibc-BHL3KM/glibc-2.31/csu/../csu/libc-start.c:308:16
6: _start
For some reason rust-gdb
doesn't show me much in this case so I used the RUST_BACKTRACE
variable.
Is the problem with my CUDA installation?
This last issue boils down to the following part of your error message:
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16
This is likely because your gpu is not recent enough to support bf16 and mistral is a bf16 model.
I tried instead running the Falcon model with the --use-f32
flag but got the same error:
$ cargo run --example falcon --features cuda --release -- --prompt "Here is a sample quick sort implementation in rust " --use-f32
Finished release [optimized] target(s) in 0.18s
Running `/home/travis/candle/target/release/examples/falcon --prompt 'Here is a sample quick sort implementation in rust ' --use-f32`
retrieved the files in 142.123µs
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_bf16_f32
The T5 model with --features cuda
does indeed run on my GPU however, so that's great!
As far as I'm concerned, the sanity-test passed.
error: asm operand type size(8) does not match type/size implied by constraint 'r'
Same here with rtx 4090. Just thousands of errors on windows 11
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16 at windows 11 wsl2 ubuntu 22.04
sope@DESKTOP-HNB502N:/mnt/c/Users/Administrator$ nvidia-smi --query-gpu=compute_cap --format=csv,noheader
7.5
sope@DESKTOP-HNB502N:/mnt/c/Users/Administrator$ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 2080 SUPER, 7.5, 546.12
sope@DESKTOP-HNB502N:/mnt/c/Users/Administrator$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0
sope@DESKTOP-HNB502N:/mnt/c/Users/Administrator$ uname -a
Linux DESKTOP-HNB502N 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
sope@DESKTOP-HNB502N:/mnt/c/Users/Administrator/workspaces/rust/candle$ cargo r -p candle-examples --example yi --features cuda -- --prompt 怎么学习rust语言
Finished dev [unoptimized + debuginfo] target(s) in 4.30s
Running `target/debug/examples/yi --prompt '怎么学习rust语言'`
avx: true, neon: false, simd128: false, f16c: true
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 59.952703ms
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16 at windows 11 wsl2 ubuntu 22.04
We require a compute_cap
of at least 8.0
to enable the bf16
support but the RTX 2080 only has support for compute cap 7.5
so you will need a more recent GPU to run the bf16
based models.
All the examples I tried doesnt work for me:
> nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 2080 with Max-Q Design, 7.5, 546.12
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:51:05_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0
I have tried to run it on windows and wsl2 .. and this is the error I get:
Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading cast_f16_f32
And some other examples throws this error:
Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading is_u32_f32
> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Feb__8_05:53:42_Coordinated_Universal_Time_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
> nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 2080, 7.5, 551.23
Microsoft Windows [Version 10.0.19045.4046]
cargo:rustc-env=CUDA_COMPUTE_CAP=75 cargo:info=Builder { cuda_root: Some("C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1"), kernel_paths: ["src\affine.cu", "src\binary.cu", "src\cast.cu", "src\conv.cu", "src\fill.cu", "src\indexing.cu", "src\reduce.cu", "src\ternary.cu", "src\unary.cu"], watch: [], include_paths: ["src\binary_op_macros.cuh", "src\compatibility.cuh", "src\cuda_utils.cuh"], compute_cap: Some(75), out_dir: "D:\Desktop\generative\src-tauri\target\debug\build\candle-kernels-2aac372284d54b6a\out", extra_args: [] } cargo:rustc-env=CUDA_INCLUDE_DIR=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include
thread 'main' panicked at C:\Users\fax1t.cargo\registry\src\index.crates.io-6f17d22bba15001f\bindgen_cuda-0.1.4\src\lib.rs:389:13: nvcc error while compiling "src\affine.cu":
Stuck with same issue trying to compile for NVIDIA Jetson Nano, while it has 4GB of RAM and pretty capable to run 7B quantized models this device barely supported by anyone else. Due to issues with cuda, old versions of nvcc shipped with jetpack. Nvidia has horrible support for own tech.
It's failed to build with
error: failed to run custom build command for `candle-kernels v0.4.0 (/home/oiwn/code/candle/candle-kernels)`
Caused by:
process didn't exit successfully: `/home/oiwn/code/candle/target/release/build/candle-kernels-b78e0c7d3d2aa31c/build-script-build` (exit status: 101)
--- stdout
cargo:rerun-if-changed=build.rs
cargo:info=["/usr", "/usr/local/cuda", "/opt/cuda", "/usr/lib/cuda", "C:/Program Files/NVIDIA GPU Computing Toolkit", "C:/CUDA"]
cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP
cargo:rustc-env=CUDA_COMPUTE_CAP=53
--- stderr
thread 'main' panicked at /home/oiwn/.cargo/registry/src/index.crates.io-6f17d22bba15001f/bindgen_cuda-0.1.4/src/lib.rs:519:43:
no gpu codes parsed from nvcc
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Which is strange cuda-kernels dependency is "bindgen_cuda = 0.1.1" which fail on this line: https://github.com/Narsil/bindgen_cuda/blob/main/src/lib.rs#L519
I'm hitting this issue with an L4...
tyler@srv:~/candle$ nvidia-smi --query-gpu=compute_cap --format=csv,noheader
8.9
tyler@srv:~/candle$ CUDA_COMPUTE_CAP=89 cargo run --example mistral --features cudnn --release -- --prompt "Here is a sample quick sort implementation in rust " -n 400
Finished release [optimized] target(s) in 0.22s
Running `target/release/examples/mistral --prompt 'Here is a sample quick sort implementation in rust ' -n 400`
avx: true, neon: false, simd128: false, f16c: true
temp: 0.00 repeat-penalty: 1.10 repeat-last-n: 64
retrieved the files in 20.977633ms
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16
thread 'main' panicked at C:\Users\fax1t.cargo\registry\src\index.crates.io-6f17d22bba15001f\bindgen_cuda-0.1.4\src\lib.rs:389:13: nvcc error while compiling "src\affine.cu":
I was getting these errors as well when using the 'Developer Command Prompt for VS 2022'.
Switching to the 'x64 Native Tools Command Prompt' solved the problem. Examples and my projects compile with the cuda
feature now and work appropriately.
I have tried to run it on windows and wsl2 .. and this is the error I get:
Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading cast_f16_f32
And some other examples throws this error:
Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading is_u32_f32
Experiencing the same issue
Building with CUDA_COMPUTE_CAP=86
I tried to avoid type casting but ended up with
Error: DriverError(CUDA_ERROR_INVALID_PTX, "a PTX JIT compilation failed") when loading affine_f32
My env: Ubunutu 22.04 server
$ uname -a
Linux nl 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
$ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
Quadro RTX 4000, 7.5, 545.29.06
Im trying to complie the ViT-model. I've tried both --release and normal debug complie mode
meet same error on windows 11
PS C:\> nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
NVIDIA GeForce RTX 2080 Ti, 7.5, 551.86
PS C:\> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:30:10_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
have the same problem
root@C.10515727:~/ai-server$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.17s
Running `target/debug/ai-server`
retrieved the files in 16.361172ms
Error: DriverError(CUDA_ERROR_NOT_FOUND, "named symbol not found") when loading cast_f32_bf16
root@C.10515727:~/ai-server$ nvidia-smi --query-gpu=name,compute_cap,driver_version --format=csv
name, compute_cap, driver_version
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07 NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
NVIDIA GeForce RTX 2080 Ti, 7.5, 535.161.07
root@C.10515727:~/ai-server$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
root@C.10515727:~/ai-server$ uname -a
Linux fb0f7633e4cb 5.4.0-172-generic #190-Ubuntu SMP Fri Feb 2 23:24:
```22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Yes Windows seems to be having issues. I've been told in discord WSL is ok.
In my case WSL2, although cuda is correctly installed (as tried by other crates through env vars and all), candle works well WITHOUT CUDA but not WITH CUDA, my error says thread 'main' panicked at /home/mau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/lib.rs:98:5: : Unable to dynamically load the "cuda" shared library - searched for library names: ["cuda", "nvcuda"].
but env vars are correctly set up...
In my case WSL2, although cuda is correctly installed (as tried by other crates through env vars and all), candle works well WITHOUT CUDA but not WITH CUDA, my error says
thread 'main' panicked at /home/mau/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/lib.rs:98:5: : Unable to dynamically load the "cuda" shared library - searched for library names: ["cuda", "nvcuda"].
but env vars are correctly set up...
Was having this issue, but was able to resolve it with a couple of things:
If that doesn't help let reach out and I can share my install script that seems to be repeatable for me at least. I tear down my WSL distro every couple of months and it becomes pretty repeatable.
CARGO_PROFILE_RELEASE_BUILD_OVERRIDE_DEBUG=true warning: some crates are on edition 2021 which defaults to
resolver = "2"
, but virtual workspaces default toresolver = "1"
note: to keep the current resolver, specifyworkspace.resolver = "1"
in the workspace root's manifest note: to use the edition 2021 resolver, specifyworkspace.resolver = "2"
in the workspace root's manifest Compiling libc v0.2.147 Compiling autocfg v1.1.0 Compiling crossbeam-utils v0.8.16 Compiling proc-macro2 v1.0.66 Compiling unicode-ident v1.0.11 Compiling rayon-core v1.11.0 Compiling memchr v2.5.0 Compiling libm v0.2.7 Compiling cfg-if v1.0.0 Compiling pkg-config v0.3.27 Compiling paste v1.0.14 Compiling serde v1.0.183 Compiling serde_derive v1.0.183 Compiling scopeguard v1.2.0 Compiling syn v1.0.109 Compiling serde_json v1.0.104 Compiling seq-macro v0.3.5 Compiling vcpkg v0.2.15 Compiling crc32fast v1.3.2 Compiling ident_case v1.0.1 Compiling strsim v0.10.0 Compiling fnv v1.0.7 Compiling thiserror v1.0.44 Compiling either v1.9.0 Compiling glob v0.3.1 Compiling openssl v0.10.56 Compiling rustls v0.21.6 Compiling anyhow v1.0.72 Compiling cudarc v0.9.13 Compiling portable-atomic v1.4.2 Compiling native-tls v0.2.11 Compiling esaxx-rs v0.1.8 Compiling adler v1.0.2 Compiling rustix v0.38.7 Compiling gimli v0.27.3 Compiling macro_rules_attribute-proc_macro v0.1.3 Compiling rustc-demangle v0.1.23 Compiling miniz_oxide v0.7.1 Compiling heck v0.4.1 Compiling flate2 v1.0.26 Compiling memoffset v0.9.0 Compiling crossbeam-epoch v0.9.15 Compiling num-traits v0.2.16 Compiling zip v0.6.6 Compiling crossbeam-channel v0.5.8 Compiling aho-corasick v1.0.2 Compiling object v0.31.1 Compiling nom v7.1.3 Compiling aho-corasick v0.7.20 Compiling quote v1.0.32 Compiling macro_rules_attribute v0.1.3 Compiling syn v2.0.28 Compiling crossbeam-deque v0.8.3 Compiling num_cpus v1.16.0 Compiling getrandom v0.2.10 Compiling dirs-sys v0.4.1 Compiling console v0.15.7 Compiling memmap2 v0.7.1 Compiling regex-automata v0.3.6 Compiling cc v1.0.82 Compiling dirs v5.0.1 Compiling rand_core v0.6.4 Compiling num-complex v0.4.3 Compiling rand_chacha v0.3.1 Compiling indicatif v0.17.6 Compiling rand v0.8.5 Compiling addr2line v0.20.0 Compiling rayon v1.7.0 Compiling is-terminal v0.4.9 Compiling ring v0.16.20 Compiling openssl-sys v0.9.91 Compiling rand_distr v0.4.3 Compiling backtrace v0.3.68 Compiling onig_sys v69.8.1 Compiling anstream v0.3.2 Compiling clap_builder v4.3.21 Compiling half v2.3.1 Compiling spm_precompiled v0.1.4 Compiling regex v1.9.3 Compiling darling_core v0.14.4 Compiling fancy-regex v0.10.0 Compiling candle-kernels v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-kernels) Compiling candle-gemm-common v0.15.5 Compiling rayon-cond v0.1.0 Compiling candle-gemm-f32 v0.15.5 Compiling candle-gemm-f64 v0.15.5 Compiling candle-gemm-c64 v0.15.5 Compiling candle-gemm-c32 v0.15.5 Compiling safetensors v0.3.2 Compiling candle-examples v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-examples) Compiling tracing-chrome v0.7.1 Compiling candle-gemm-f16 v0.15.5 error: failed to run custom build command forcandle-kernels v0.1.0 (/mnt/source1/djbGR/ruststuffs/candle/candle-kernels)
Caused by: process didn't exit successfully:
/mnt/source1/djbGR/ruststuffs/candle/target/release/build/candle-kernels-e21ab5b8e8daaf0a/build-script-build
(exit status: 101) --- stdout cargo:rerun-if-changed=build.rs cargo:rustc-env=CUDA_INCLUDE_DIR=/usr/local/cuda/include cargo:rerun-if-changed=src/ cargo:rerun-if-env-changed=CUDA_COMPUTE_CAP cargo:rustc-env=CUDA_COMPUTE_CAP=sm_61--- stderr src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
2 errors detected in the compilation of "src/indexing.cu". src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
2 errors detected in the compilation of "src/affine.cu". src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
2 errors detected in the compilation of "src/cast.cu". 2 errors detected in the compilation of "src/reduce.cu". 2 errors detected in the compilation of "src/conv.cu". src/compatibility.cuh(19): error: function "hmax_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmax_nan(half a, half b) { ^
src/compatibility.cuh(22): error: function "hmin_nan(half, half)" has already been defined attribute((device)) inline attribute((always_inline)) half __hmin_nan(half a, half b) { ^
2 errors detected in the compilation of "src/ternary.cu". 2 errors detected in the compilation of "src/unary.cu". 2 errors detected in the compilation of "src/binary.cu". thread 'main' panicked at 'nvcc error while compiling "src/affine.cu":
stdout
stderr
', candle-kernels/build.rs:207:13 stack backtrace: 0: 0x557f8498d0b1 - std::backtrace_rs::backtrace::libunwind::trace::hb01a67340c9cfb71 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5 1: 0x557f8498d0b1 - std::backtrace_rs::backtrace::trace_unsynchronized::h896aca561948c930 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5 2: 0x557f8498d0b1 - std::sys_common::backtrace::_print_fmt::h8627be5b68fbde29 at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:65:5 3: 0x557f8498d0b1 -::fmt::h1b7758da45f4cd22
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:44:22
4: 0x557f849b282c - core::fmt::rt::Argument::fmt::h0eb38586043a01ca
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/fmt/rt.rs:138:9
5: 0x557f849b282c - core::fmt::write::h68b52f8aa598961e
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/fmt/mod.rs:1094:21
6: 0x557f8498949e - std::io::Write::write_fmt::hc5568929b662da92
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/io/mod.rs:1714:15
7: 0x557f8498cec5 - std::sys_common::backtrace::_print::h65aecbff12ca83c8
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:47:5
8: 0x557f8498cec5 - std::sys_common::backtrace::print::hf75ac9d60598d247
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:34:9
9: 0x557f8498e483 - std::panicking::default_hook::{{closure}}::hc2cb8da3be7476b0
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:269:22
10: 0x557f8498e19d - std::panicking::default_hook::hefa49c86da66275b
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:288:9
11: 0x557f8498ea09 - std::panicking::rust_panic_with_hook::hd4c3b0056ba96951
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:705:13
12: 0x557f8498e907 - std::panicking::begin_panic_handler::{{closure}}::he487675683e9a525
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:597:13
13: 0x557f8498d516 - std::sys_common::backtrace::rust_end_short_backtrace::hcff58b9b81620321
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:151:18
14: 0x557f8498e652 - rust_begin_unwind
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:593:5
15: 0x557f848b9333 - core::panicking::panic_fmt::h1b81548733a03bd5
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/panicking.rs:67:14
16: 0x557f848c3323 - build_script_build::cuda::build_ptx::ha488acce3cd701b3
at /mnt/source1/djbGR/ruststuffs/candle/candle-kernels/build.rs:207:13
17: 0x557f848c0878 - build_script_build::main::h2523e6c20b65fa04
at /mnt/source1/djbGR/ruststuffs/candle/candle-kernels/build.rs:6:33
18: 0x557f848d40cb - core::ops::function::FnOnce::call_once::h385ddf31127d3e12
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/ops/function.rs:250:5
19: 0x557f848ccbae - std::sys_common::backtrace::rust_begin_short_backtrace::h1cfd550c72c3e194
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/sys_common/backtrace.rs:135:18
20: 0x557f848e0130 - std::rt::lang_start::{{closure}}::h70dc5fa7783a03f7
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:166:18
21: 0x557f8498541b - core::ops::function::impls::<impl core::ops::function::FnOnce for &F>::call_once::h9eccf02cf11756f6
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/core/src/ops/function.rs:284:13
22: 0x557f8498541b - std::panicking::try::do_call::hc95b838862bbb45a
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:500:40
23: 0x557f8498541b - std::panicking::try::h82935254d12a76fc
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:464:19
24: 0x557f8498541b - std::panic::catch_unwind::h7fd9d11cd70fc350
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panic.rs:142:14
25: 0x557f8498541b - std::rt::lang_start_internal::{{closure}}::h0ddb191e68b650a4
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:148:48
26: 0x557f8498541b - std::panicking::try::do_call::h17d4693c7a6e120c
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:500:40
27: 0x557f8498541b - std::panicking::try::h684fc020e1305912
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panicking.rs:464:19
28: 0x557f8498541b - std::panic::catch_unwind::h757da538db515116
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/panic.rs:142:14
29: 0x557f8498541b - std::rt::lang_start_internal::ha6b1625a1e9a4f5b
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:148:20
30: 0x557f848e010a - std::rt::lang_start::h0d1360f20fc735dd
at /rustc/39f42ad9e8430a8abb06c262346e89593278c515/library/std/src/rt.rs:165:17
31: 0x557f848c43fe - main
32: 0x7fd8be429d90 - libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
33: 0x7fd8be429e40 - libc_start_main_impl
at ./csu/../csu/libc-start.c:392:3
34: 0x557f848b9a15 - _start
35: 0x0 -