kyutai-labs / moshi

Apache License 2.0
5.79k stars 434 forks source link

Getting cuda out of memory running on Rust q8 with RTX 4060 8g vram #125

Open dewrama opened 3 hours ago

dewrama commented 3 hours ago

Due diligence

Topic

The Rust implementation

Question

I am getting cuda out of memory. I am running q8 version on wsl, Ubuntu, RTX 4060 with 8g vram. I thought the hardware could run the quantized version. Am I doing something wrong? Please help. (I also tried cuda_compute_cap with other lower numbers and still same problem)

CUDA_COMPUTE_CAP=86 cargo run --features cuda --bin moshi-backend -r -- --co nfig moshi-backend/config-q8.json standalone

Finished release profile [optimized + debuginfo] target(s) in 1m 15s Running target/release/moshi-backend --config moshi-backend/config-q8.json standalone 2024-09-29T20:03:12.168129Z INFO moshi_backend: build_info=BuildInfo { build_timestamp: "2024-09-22T23:05:21.856959080Z", build_date: "2024-09-22", git_branch: "main", git_timestamp: "2024-09-21T17:30:23.000000000+02:00", git_date: "2024-09-21", git_hash: "3e3e573b28a1d1d6be084185e1a2e6e550c1ddcf", git_describe: "3e3e573", rustc_host_triple: "x86_64-unknown-linux-gnu", rustc_version: "1.81.0", cargo_target_triple: "x86_64-unknown-linux-gnu" } 2024-09-29T20:03:12.168212Z INFO moshi_backend: starting process with pid 30709

Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")

LaurentMazare commented 3 hours ago

The current version is too large for a 8GB GPU, see the faq.