Closed danielclough closed 8 months ago
On this example, I think running for that long (and continuing) is actually expected as we don't reach an end of stream token nor the twice 187 tokens mentioned in the document you're referring too. So far I haven't found a prompt that would trigger such end of streams so it's a bit unclear to me if this should be implemented or not.
Perhaps the example README should indicate that it will run for many more lines than what is currently showed as a response so people don't think that their response is buggy?
Closing as the semantics should now be similar to the python ones.
On Ubuntu 22.04 using CUDA. Repo is in sync with
main
.I spent a few seconds to see if I could fix it and found these docs that mention stopping criteria: https://huggingface.co/docs/transformers/model_doc/rwkv
I can probably spend more time on this later, but @LaurentMazare can probably fix it real quick. :superhero:
I'm excited about RWKV! Thanks again!