Open do-me opened 23 hours ago
Thanks for reporting this. The rust implementation of moshi currently falls back to using f32
rather than bf16
on metal as the necessary matmul kernel was only added to candle very recently. We will update this once it's available in a released candle, in the meantime, could you give a try to the q8
version, this is likely to work better.
Thanks for the quick reply. Used this config.json still sounds like it has issues, but this time it's different frequency.
{
"instance_name": "foo",
"hf_repo": "kyutai/moshiko-candle-q8",
"lm_model_file": "$HOME/tmp/model.q8.gguf",
"text_tokenizer_file": "$HOME/tmp/tokenizer_spm_32k_3.model",
"log_dir": "$HOME/tmp/moshi-logs",
"encodec_model_file": "$HOME/tmp/tokenizer-e351c8d8-checkpoint125.safetensors",
"encodec_num_codebooks": 8,
"static_dir": "../client/dist",
"addr": "0.0.0.0",
"port": 8998,
"cert_dir": "."
}
At least it seems to be real-time now but indeed there is still something weird going on. Maybe you could try connecting with the command line client rather than the web UI as described here. You could also try out the python/MLX implementation too. Would be interesting to know if any of these two work better to narrow down where the problem actually is.
python -m moshi_mlx.local_web --hf-repo kyutai/moshiko-mlx-bf16
works fine: moshi audio (4).webmcargo run --bin moshi-cli -r -- tui --host localhost
throws a permission error:2024-09-20T12:24:07.542783Z INFO moshi_cli::multistream::client_tui: connecting to wss://localhost:8998/api/chat
Setup audio output stream!
cpal device: MacBook Pro Speakers 44100 StreamConfig { channels: 2, sample_rate: SampleRate(44100), buffer_size: Default }
Setup audio input stream!
cpal device: MacBook Pro Microphone 44100 StreamConfig { channels: 1, sample_rate: SampleRate(44100), buffer_size: Default }
Error: IO error: Connection refused (os error 61)
Caused by:
Connection refused (os error 61)
Thanks, good to know that at least the python/mlx version works.
When using the cli client, was the moshi-backend
still running? (it's the same server as the one that provides the web ui) Looks like the client wasn't able to connect to it.
Backend impacted
The Rust implementation
Operating system
Mac OS X
Hardware
Metal with MLX
Description
moshi audio (1).webm
Listen to the audio sample, seems like there is something oscillating. I am using default settings:
Extra information
Running on an M3 Max 128Gb. Logs seem fine I guess:
Environment
Fill in the following information on your system.
If the backend impacted is PyTorch:
python -c 'import torch; print(torch.version.cuda)'
):If the backend is MLX: