mdrokz / rust-llama.cpp

LLama.cpp rust bindings
https://crates.io/crates/llama_cpp_rs/
MIT License
290 stars 42 forks source link

Error when enabling CUDA on Windows #24

Closed Kuinox closed 2 months ago

Kuinox commented 7 months ago

When enabling the cuda feature, I get the following error on windows:

[...]
  running: "nvcc" "-O0" "-ffunction-sections" "-fdata-sections" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "./llama.cpp/ggml-cuda.h" "-Wall" "-Wextra" "--forward-unknown-to-host-compiler" "-arch=native" "/W4" "/Wall" "/wd4820" "/wd4710" "/wd4711" "/wd4820" "/wd4514" "-DGGML_USE_CUBLAS" "-DGGML_CUDA_DMMV_X=32" "-DGGML_CUDA_DMMV_Y=1" "-DK_QUANTS_PER_ITERATION=2" "-Wno-pedantic" "-o" "C:\\dev\\ai_kuinox\\target\\debug\\build\\llama_cpp_rs-dbbb5a5dac5f7f5e\\out\\./llama.cpp/ggml-cuda.o" "-c" "./llama.cpp/ggml-cuda.cu"
  nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
  exit code: 1
Kuinox commented 7 months ago

There is also a different(?) issue when trying to run with OpenCL.
./llama.cpp/ggml-opencl.cpp(10): fatal error C1083: Cannot open include file: 'clblast.h': No such file or directory

mdrokz commented 7 months ago

When enabling the cuda feature, I get the following error on windows:

[...]
  running: "nvcc" "-O0" "-ffunction-sections" "-fdata-sections" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "./llama.cpp/ggml-cuda.h" "-Wall" "-Wextra" "--forward-unknown-to-host-compiler" "-arch=native" "/W4" "/Wall" "/wd4820" "/wd4710" "/wd4711" "/wd4820" "/wd4514" "-DGGML_USE_CUBLAS" "-DGGML_CUDA_DMMV_X=32" "-DGGML_CUDA_DMMV_Y=1" "-DK_QUANTS_PER_ITERATION=2" "-Wno-pedantic" "-o" "C:\\dev\\ai_kuinox\\target\\debug\\build\\llama_cpp_rs-dbbb5a5dac5f7f5e\\out\\./llama.cpp/ggml-cuda.o" "-c" "./llama.cpp/ggml-cuda.cu"
  nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
  exit code: 1

are you running this in docker ? if so are you using the provided dockerfile ?

Kuinox commented 7 months ago

No, I'm running this directly on Windows.

mdrokz commented 7 months ago

No, I'm running this directly on Windows.

Can you try running using the docker image ? i have never tested by running directly so i have no idea if it works or not.

Vali-98 commented 7 months ago

I've been trying to get this working with Tauri, and it seems to work with CPU only, but fails on CUDA for windows. I attempted to compile it manually and that seemed more successful. I think the issue is with the compiler flags used for nvcc, but I can't really confirm if this is the case.

mdrokz commented 6 months ago

I've been trying to get this working with Tauri, and it seems to work with CPU only, but fails on CUDA for windows. I attempted to compile it manually and that seemed more successful. I think the issue is with the compiler flags used for nvcc, but I can't really confirm if this is the case.

I will try to look into it when i get some time and see how to directly compile on windows. Can you share me your steps on how you directly compiled cuda?

Vali-98 commented 6 months ago

I will try to look into it when i get some time and see how to directly compile on windows. Can you share me your steps on how you directly compiled cuda?

I did some more testing and adding / removing flags actually helped with compiling on Windows, and I modified compile_cuda as such:

fn compile_cuda(cxx_flags: &str) {

    if cfg!(target_os = "linux") || cfg!(target_os = "macos") {
        println!("cargo:rustc-link-search=native=/usr/local/cuda/lib64");
        println!("cargo:rustc-link-search=native=/opt/cuda/lib64");

        if let Ok(cuda_path) = std::env::var("CUDA_PATH") {
            println!(
                "cargo:rustc-link-search=native={}/targets/x86_64-linux/lib",
                cuda_path
            );
    }
    } else if cfg!(target_os = "windows") {
        if let Ok(cuda_path) = std::env::var("CUDA_PATH") {
            println!(
                "cargo:rustc-link-search=native={}/lib/x64",
                cuda_path
            );
        }
    }

    let mut libs = String::from("cublas cudart cublasLt");
    if cfg!(target_os = "linux") || cfg!(target_os = "macos"){
        libs.push_str(" culibos pthread dl rt")
    }

    for lib in libs.split_whitespace() {
        println!("cargo:rustc-link-lib={}", lib);
    }

    let mut nvcc = cc::Build::new();

    let env_flags = vec![
        ("LLAMA_CUDA_DMMV_X=32", "-DGGML_CUDA_DMMV_X"),
        ("LLAMA_CUDA_DMMV_Y=1", "-DGGML_CUDA_DMMV_Y"),
        ("LLAMA_CUDA_KQUANTS_ITER=2", "-DK_QUANTS_PER_ITERATION"),
    ];

    let nvcc_flags = "--forward-unknown-to-host-compiler -arch=native";

    for nvcc_flag in nvcc_flags.split_whitespace() {
        nvcc.flag(nvcc_flag);
    }

    for cxx_flag in cxx_flags.split_whitespace() {
        if cxx_flag.find("\\") == Some(0) {
            nvcc.flag(cxx_flag);
        }
        else {
            nvcc.ar_flag(cxx_flag);
        }
    }

    for env_flag in env_flags {
        let mut flag_split = env_flag.0.split("=");
        if let Ok(val) = std::env::var(flag_split.next().unwrap()) {
            nvcc.flag(&format!("{}={}", env_flag.1, val));
        } else {
            nvcc.flag(&format!("{}={}", env_flag.1, flag_split.next().unwrap()));
        }
    }

    nvcc.compiler("nvcc")
        .file("./llama.cpp/ggml-cuda.cu")
        .no_default_flags(true)
        .extra_warnings(false)
        //.flag("-Wno-pedantic")
        .ar_flag("/NODEFAULTLIB:libcmt.lib")
        .include("./llama.cpp/ggml-cuda.h")
        .compile("ggml-cuda");

}

Plus, these flags are added as ar_flag instead of flag, though I am not sure if this actually achieves anything:

       cx_flags.push_str(" /W4 /Wall /wd4820 /wd4710 /wd4711 /wd4820 /wd4514 /MT");
       cxx_flags.push_str(" /W4 /Wall /wd4820 /wd4710 /wd4711 /wd4820 /wd4514 /MT");

Disclaimer: I have close to 0 experience in Rust and building C / C++, I was essentially just knocking rocks together and seeing what works.

Also, as an added bonus, OpenBlas is also completely broken for Windows! I gave up on trying to get this to work. Good luck!

FlooferLand commented 6 months ago

Any update on this? Getting the same error. I tried writing my own docker file using the same base image as the CUDA example and it works fine since the docker image uses Ubuntu; Windows support in general appears to be missing. I haven't tested it but providing a quick patch like the person above me suggested would be helpful, having to compile and run my Cargo project from a docker container is a bit painful in my current configuration.

mdrokz commented 6 months ago

Any update on this? Getting the same error. I tried writing my own docker file using the same base image as the CUDA example and it works fine since the docker image uses Ubuntu; Windows support in general appears to be missing. I haven't tested it but providing a quick patch like the person above me suggested would be helpful, having to compile and run my Cargo project from a docker container is a bit painful in my current configuration.

Hey i wasnt getting any time due to my job i will try to look into this asap.

tbogdala commented 6 months ago

Windows support in general appears to be missing.

My fork has an updated build.rs that enables cuda on windows and metal on mac. you might be able to figure out what you need for windows from my commit, @FlooferLand. It's not the cleanest thing I've ever written, but it seems to work.

https://github.com/mdrokz/rust-llama.cpp/commit/ae49e02ff667b329271f1975d78298571f7f8b76

FlooferLand commented 6 months ago

@tbogdala Tried out your patch! Left me very confused xD I'll just leave the log here: build.txt

lbux commented 5 months ago

@tbogdala Tried out your patch! Left me very confused xD I'll just leave the log here: build.txt

I received a somewhat related issue and it seems like the commit is trying to run a .sh file on Windows which of course does not work as expected. I was able to bypass the error by removing the mentions to the .sh script and making a function that does what the script tries to do. However, this just ended up causing issues when using llama_cpp_rs as a dependency in a different project, so it is not a good solution. I'll have to take a deeper look into it.