kyutai-labs / moshi

Apache License 2.0
2.12k stars 122 forks source link

Shouldn't q8 work in 3060/12GB? #54

Open jikkuatwork opened 1 day ago

jikkuatwork commented 1 day ago

Due diligence

Topic

The Rust implementation

Question

System Config

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2039 G /usr/lib/xorg/Xorg 537MiB | | 0 N/A N/A 2269 G /usr/bin/gnome-shell 67MiB | | 0 N/A N/A 4224 G ...9d0e33034f2368c6ed2015474b1d818a902 206MiB | | 0 N/A N/A 9191 G alacritty 9MiB | | 0 N/A N/A 26191 G /home/HOME/Apps/Telegram/Telegram 4MiB | +-----------------------------------------------------------------------------------------+


## Observations

Tried: `cargo run --bin moshi-backend -r -- --config moshi-backend/config-q8.json standalone`

- The UI loads but the speed is [unacceptably slow](https://github.com/user-attachments/assets/b60f82d1-f78a-4c51-b789-a356f345b25e) and the voice is distorted
- `nvtop` shows that the model isn't loading

Tried: `cargo run --features cuda --bin moshi-backend -r -- --config moshi-backend/config-q8.json standalone`

- Loading model to GPU fails! (I thought 12GB was enough to load the 7GB GGUF? GPU hardly had 1GB used)

❮ cargo run --features cuda --bin moshi-backend -r -- --config moshi-backend/config-q8.json standalone warning: profiles for the non root package will be ignored, specify profiles at the workspace root: package: /home/HOME/Projects/outside_projects/moshi/rust/moshi-core/Cargo.toml workspace: /home/HOME/Projects/outside_projects/moshi/rust/Cargo.toml warning: profiles for the non root package will be ignored, specify profiles at the workspace root: package: /home/HOME/Projects/outside_projects/moshi/rust/moshi-backend/Cargo.toml workspace: /home/HOME/Projects/outside_projects/moshi/rust/Cargo.toml warning: profiles for the non root package will be ignored, specify profiles at the workspace root: package: /home/HOME/Projects/outside_projects/moshi/rust/moshi-cli/Cargo.toml workspace: /home/HOME/Projects/outside_projects/moshi/rust/Cargo.toml Finished release profile [optimized] target(s) in 0.23s Running target/release/moshi-backend --config moshi-backend/config-q8.json standalone 2024-09-18T18:20:02.612428Z INFO moshi_backend: build_info=BuildInfo { build_timestamp: "2024-09-18T16:57:00.763883182Z", build_date: "2024-09-18", git_branch: "main", git_timestamp: "2024-09-18T17:45:09.000000000+02:00", git_date: "2024-09-18", git_hash: "f3218c60a115b745b1848bb8297df5eb404a041a", git_describe: "f3218c6", rustc_host_triple: "x86_64-unknown-linux-gnu", rustc_version: "1.80.1", cargo_target_triple: "x86_64-unknown-linux-gnu" } 2024-09-18T18:20:02.612441Z INFO moshi_backend: starting process with pid 752759 2024-09-18T18:20:02.612457Z INFO hf_hub: Token file not found "/home/HOME/.cache/huggingface/token" 2024-09-18T18:20:02.682964Z INFO hf_hub: Token file not found "/home/HOME/.cache/huggingface/token" 2024-09-18T18:20:07.910280Z INFO moshi_backend::standalone: warming up the model Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory") moshi/rust on  main [?] is 📦 v0.2.0 via 🦀 v1.80.1 took 6s

adefossez commented 7 hours ago

That is a good question, would have to look more into it. Maybe @LaurentMazare would have an opinion on this?

jikkuatwork commented 7 hours ago

Thanks a lot! Appreciate your time!

LaurentMazare commented 17 minutes ago

I cannot really test this at the moment but I think it's somewhat expected. The weights are ~8.17GB but when in q8 mode we pre-allocate a kv-cache that is for 4096 steps (~5 mins of conversation) in f32 - we should aim at using bf16 instead but that's likely to require some changes on the candle side, the kv-cache is ~4GB, and activations + the mimi parts also have to be stored but they should be pretty small. So overall we're a bit above 12GB here. One thing you could try is tweaking this line to be something like 1000 and see if it helps. You'll only be able to have short sessions with moshi but if it works we could consider making this configurable somehow.