Closed panamka closed 9 months ago
LocalMHA currently operates on chunks of about 0.3 seconds, so this latency might not be acceptable for some cases It's kind of a trade-off between lower bitrate (less tokens) and lower latency.
For lower sample rates, I'd suggest lowering the decoder_dim (and encoder), as the current parameter count is pretty high for vocoders. For speech also lower bitrate (less codebooks) should be enough
Yes
btw just so you know, I plan to release a speech codec next week
Thank you very much for the valuable comments. I will keep an eye on new releases.
Hello! Thank you very much for the great work!
I want to try to train a model based on yours, but for speech coding. And I have a few questions.
Thank you in advance