Closed Urammar closed 1 year ago
oops this should have been in the ooba plugin, but regardless if you are from the future, just disable token streaming in ooba, otherwise indeed it is resent the entire incomplete string to regenerate, every single token.
This is not a problem with bark, its just doing its job, generate speech from tokens thrown at it.
This model generates on each actual letter, slowing everything down, and needs to wait until the last token to speak