Open jikkuatwork opened 1 day ago
That is a good question, would have to look more into it. Maybe @LaurentMazare would have an opinion on this?
Thanks a lot! Appreciate your time!
I cannot really test this at the moment but I think it's somewhat expected. The weights are ~8.17GB but when in q8 mode we pre-allocate a kv-cache that is for 4096 steps (~5 mins of conversation) in f32 - we should aim at using bf16 instead but that's likely to require some changes on the candle side, the kv-cache is ~4GB, and activations + the mimi parts also have to be stored but they should be pretty small. So overall we're a bit above 12GB here. One thing you could try is tweaking this line to be something like 1000 and see if it helps. You'll only be able to have short sessions with moshi but if it works we could consider making this configurable somehow.
Due diligence
Topic
The Rust implementation
Question
System Config
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 2039 G /usr/lib/xorg/Xorg 537MiB | | 0 N/A N/A 2269 G /usr/bin/gnome-shell 67MiB | | 0 N/A N/A 4224 G ...9d0e33034f2368c6ed2015474b1d818a902 206MiB | | 0 N/A N/A 9191 G alacritty 9MiB | | 0 N/A N/A 26191 G /home/HOME/Apps/Telegram/Telegram 4MiB | +-----------------------------------------------------------------------------------------+
❮ cargo run --features cuda --bin moshi-backend -r -- --config moshi-backend/config-q8.json standalone warning: profiles for the non root package will be ignored, specify profiles at the workspace root: package: /home/HOME/Projects/outside_projects/moshi/rust/moshi-core/Cargo.toml workspace: /home/HOME/Projects/outside_projects/moshi/rust/Cargo.toml warning: profiles for the non root package will be ignored, specify profiles at the workspace root: package: /home/HOME/Projects/outside_projects/moshi/rust/moshi-backend/Cargo.toml workspace: /home/HOME/Projects/outside_projects/moshi/rust/Cargo.toml warning: profiles for the non root package will be ignored, specify profiles at the workspace root: package: /home/HOME/Projects/outside_projects/moshi/rust/moshi-cli/Cargo.toml workspace: /home/HOME/Projects/outside_projects/moshi/rust/Cargo.toml Finished
release
profile [optimized] target(s) in 0.23s Runningtarget/release/moshi-backend --config moshi-backend/config-q8.json standalone
2024-09-18T18:20:02.612428Z INFO moshi_backend: build_info=BuildInfo { build_timestamp: "2024-09-18T16:57:00.763883182Z", build_date: "2024-09-18", git_branch: "main", git_timestamp: "2024-09-18T17:45:09.000000000+02:00", git_date: "2024-09-18", git_hash: "f3218c60a115b745b1848bb8297df5eb404a041a", git_describe: "f3218c6", rustc_host_triple: "x86_64-unknown-linux-gnu", rustc_version: "1.80.1", cargo_target_triple: "x86_64-unknown-linux-gnu" } 2024-09-18T18:20:02.612441Z INFO moshi_backend: starting process with pid 752759 2024-09-18T18:20:02.612457Z INFO hf_hub: Token file not found "/home/HOME/.cache/huggingface/token" 2024-09-18T18:20:02.682964Z INFO hf_hub: Token file not found "/home/HOME/.cache/huggingface/token" 2024-09-18T18:20:07.910280Z INFO moshi_backend::standalone: warming up the model Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory") moshi/rust on main [?] is 📦 v0.2.0 via 🦀 v1.80.1 took 6s