kyutai-labs / moshi

Apache License 2.0
3.51k stars 252 forks source link

Strange feedback / interrupting oscillation sound #85

Open do-me opened 23 hours ago

do-me commented 23 hours ago

Backend impacted

The Rust implementation

Operating system

Mac OS X

Hardware

Metal with MLX

Description

moshi audio (1).webm

Listen to the audio sample, seems like there is something oscillating. I am using default settings:

image image

Extra information

Running on an M3 Max 128Gb. Logs seem fine I guess:

(py3.12) ➜  rust git:(main) cargo run --features metal --bin moshi-backend -r -- --config moshi-backend/config.json standalone

warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /Users/dome/work/general/moshi/rust/moshi-core/Cargo.toml
workspace: /Users/dome/work/general/moshi/rust/Cargo.toml
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /Users/dome/work/general/moshi/rust/moshi-backend/Cargo.toml
workspace: /Users/dome/work/general/moshi/rust/Cargo.toml
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /Users/dome/work/general/moshi/rust/moshi-cli/Cargo.toml
workspace: /Users/dome/work/general/moshi/rust/Cargo.toml
    Finished `release` profile [optimized] target(s) in 0.38s
     Running `target/release/moshi-backend --config moshi-backend/config.json standalone`
2024-09-20T07:08:46.442663Z  INFO moshi_backend: build_info=BuildInfo { build_timestamp: "2024-09-20T07:03:50.253396000Z", build_date: "2024-09-20", git_branch: "main", git_timestamp: "2024-09-19T20:27:36.000000000-06:00", git_date: "2024-09-19", git_hash: "d828f0bd07638cdfb37c748ec5d743a7e09a9b91", git_describe: "d828f0b", rustc_host_triple: "aarch64-apple-darwin", rustc_version: "1.79.0", cargo_target_triple: "aarch64-apple-darwin" }
2024-09-20T07:08:46.442702Z  INFO moshi_backend: starting process with pid 63659
2024-09-20T07:08:46.442794Z  INFO hf_hub: Token file not found "/Users/dome/.cache/huggingface/token"
2024-09-20T07:08:46.446945Z  INFO hf_hub: Token file not found "/Users/dome/.cache/huggingface/token"
2024-09-20T07:08:57.632417Z  INFO moshi_backend::standalone: warming up the model
2024-09-20T07:08:59.041964Z  INFO moshi_backend::standalone: model is ready to roll!
2024-09-20T07:08:59.042072Z  INFO moshi_backend::standalone: serving static dir /Users/dome/.cache/huggingface/hub/models--kyutai--moshi-artifacts/snapshots/8481e95f73827e4e70ac7311c12b0be099276182/dist
2024-09-20T07:08:59.042243Z  INFO moshi_backend::standalone: standalone worker listening on https://0.0.0.0:8998
2024-09-20T07:09:10.106815Z  INFO moshi_backend::standalone: received connection addr=127.0.0.1:65042
2024-09-20T07:09:10.106926Z  INFO moshi_backend::stream_both: accepted websocket connection
2024-09-20T07:09:10.107062Z  INFO moshi_backend::stream_both: starting streaming
2024-09-20T07:09:10.107675Z  INFO moshi_backend::stream_both: processing loop
2024-09-20T07:09:22.073522Z  INFO moshi_backend::stream_both: socket closed
2024-09-20T07:09:22.073545Z ERROR moshi_backend::stream_both: loop1 ended r=Ok(Ok(()))
2024-09-20T07:09:22.073548Z  INFO moshi_backend::stream_both: decoder closed
2024-09-20T07:09:22.353369Z  INFO moshi_backend::stream_both: sender err err="Trying to work with closed connection"
2024-09-20T07:10:13.031055Z  INFO moshi_backend::standalone: received connection addr=127.0.0.1:65464
2024-09-20T07:10:13.031187Z  INFO moshi_backend::stream_both: accepted websocket connection
2024-09-20T07:10:13.031264Z  INFO moshi_backend::stream_both: starting streaming
2024-09-20T07:10:13.032374Z  INFO moshi_backend::stream_both: processing loop
2024-09-20T07:11:10.764157Z  INFO moshi_backend::stream_both: socket closed
2024-09-20T07:11:10.764181Z ERROR moshi_backend::stream_both: loop1 ended r=Ok(Ok(()))
2024-09-20T07:11:10.764188Z  INFO moshi_backend::stream_both: decoder closed
2024-09-20T07:11:10.907340Z  INFO moshi_backend::stream_both: sender err err="Trying to work with closed connection"
2024-09-20T07:11:46.028502Z  INFO moshi_backend::standalone: received connection addr=127.0.0.1:49524
2024-09-20T07:11:46.028634Z  INFO moshi_backend::stream_both: accepted websocket connection
2024-09-20T07:11:46.028664Z  INFO moshi_backend::stream_both: starting streaming
2024-09-20T07:11:46.029095Z  INFO moshi_backend::stream_both: processing loop
2024-09-20T07:12:05.734397Z  INFO moshi_backend::stream_both: socket closed
2024-09-20T07:12:05.734423Z ERROR moshi_backend::stream_both: loop1 ended r=Ok(Ok(()))
2024-09-20T07:12:05.734431Z  INFO moshi_backend::stream_both: decoder closed
2024-09-20T07:12:05.752561Z  INFO moshi_backend::stream_both: sender err err="Trying to work with closed connection"

Environment

Fill in the following information on your system.

If the backend impacted is PyTorch:

If the backend is MLX:

LaurentMazare commented 20 hours ago

Thanks for reporting this. The rust implementation of moshi currently falls back to using f32 rather than bf16 on metal as the necessary matmul kernel was only added to candle very recently. We will update this once it's available in a released candle, in the meantime, could you give a try to the q8 version, this is likely to work better.

do-me commented 19 hours ago

Thanks for the quick reply. Used this config.json still sounds like it has issues, but this time it's different frequency.

{
  "instance_name": "foo",
  "hf_repo": "kyutai/moshiko-candle-q8",
  "lm_model_file": "$HOME/tmp/model.q8.gguf",
  "text_tokenizer_file": "$HOME/tmp/tokenizer_spm_32k_3.model",
  "log_dir": "$HOME/tmp/moshi-logs",
  "encodec_model_file": "$HOME/tmp/tokenizer-e351c8d8-checkpoint125.safetensors",
  "encodec_num_codebooks": 8,
  "static_dir": "../client/dist",
  "addr": "0.0.0.0",
  "port": 8998,
  "cert_dir": "."
}

moshi audio (3).webm moshi audio (2).webm

LaurentMazare commented 18 hours ago

At least it seems to be real-time now but indeed there is still something weird going on. Maybe you could try connecting with the command line client rather than the web UI as described here. You could also try out the python/MLX implementation too. Would be interesting to know if any of these two work better to narrow down where the problem actually is.

do-me commented 18 hours ago
2024-09-20T12:24:07.542783Z  INFO moshi_cli::multistream::client_tui: connecting to wss://localhost:8998/api/chat
Setup audio output stream!
cpal device: MacBook Pro Speakers 44100 StreamConfig { channels: 2, sample_rate: SampleRate(44100), buffer_size: Default }
Setup audio input stream!
cpal device: MacBook Pro Microphone 44100 StreamConfig { channels: 1, sample_rate: SampleRate(44100), buffer_size: Default }
Error: IO error: Connection refused (os error 61)

Caused by:
    Connection refused (os error 61)
LaurentMazare commented 15 hours ago

Thanks, good to know that at least the python/mlx version works. When using the cli client, was the moshi-backend still running? (it's the same server as the one that provides the web ui) Looks like the client wasn't able to connect to it.