Closed dewrama closed 1 month ago
The current version is too large for a 8GB GPU, see the faq.
Thanks for reply, I saw the recently updated FAQ. Being curious and creative, is there any way to work around this vram limitation such as using nvidia unified memory? I also read that the new intel Ultra cpu can use Arc to offload to memory. A lot of us have limited hardware (or almost able to run) and it would be great if we can all use a scaled down version. Thanks!
I cannot think of a very easy way to get around this, we have a q4 quantized version that can work on 12GB or even 8GB but I find it to be actually quite worse quality than the original one so wouldn't recommend going this way. Would certainly be great if some alternative implementations emerge in the community and improve on memory requirements etc.
Due diligence
Topic
The Rust implementation
Question
I am getting cuda out of memory. I am running q8 version on wsl, Ubuntu, RTX 4060 with 8g vram. I thought the hardware could run the quantized version. Am I doing something wrong? Please help. (I also tried cuda_compute_cap with other lower numbers and still same problem)
CUDA_COMPUTE_CAP=86 cargo run --features cuda --bin moshi-backend -r -- --co nfig moshi-backend/config-q8.json standalone
Finished
release
profile [optimized + debuginfo] target(s) in 1m 15s Runningtarget/release/moshi-backend --config moshi-backend/config-q8.json standalone
2024-09-29T20:03:12.168129Z INFO moshi_backend: build_info=BuildInfo { build_timestamp: "2024-09-22T23:05:21.856959080Z", build_date: "2024-09-22", git_branch: "main", git_timestamp: "2024-09-21T17:30:23.000000000+02:00", git_date: "2024-09-21", git_hash: "3e3e573b28a1d1d6be084185e1a2e6e550c1ddcf", git_describe: "3e3e573", rustc_host_triple: "x86_64-unknown-linux-gnu", rustc_version: "1.81.0", cargo_target_triple: "x86_64-unknown-linux-gnu" } 2024-09-29T20:03:12.168212Z INFO moshi_backend: starting process with pid 30709Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")