allow chat to halt new token generation on `stop_sequence`

averypelle commented 1 year ago

Closes https://github.com/rustformers/llm/issues/363

Stop token generation after reaching a specified stop_sequence in chat mode

I am still new to rust so please let me know how I can improve my code!

philpax commented 1 year ago

Great work! I was actually thinking about bringing that logic out so that I could use it for llmcord.

Do you think you'd be able to move the inference_callback to llm-base, name it something like conversation_inference_callback, and update both llm-cli and the vicuna-chat example to use it? (You might need to parameterise over print_token)

That would allow for this logic to be used across both, as well as elsewhere (the aforementioned llmcord).

I'd suggest passing the stop sequence in from the CLI (i.e. maybe replace message_prompt with message_prompt_prefix and use that as the stop sequence.)

i didn't touch REPL mode - is the desired behavior to continue generating tokens in that mode? otherwise happy to change

Yup, that's for a back and forth where no state is preserved and the model can produce as much output as it wants. I'd suggest splitting the code paths so that they use entirely different inference callbacks - they share the readline, but their inference behaviour is pretty different.

Great work once again, let me know if you need a hand with any of this! 🙂

averypelle commented 1 year ago

Thanks @philpax! Just pushed an update where I moved the function to llm-base. For the message_prompt, is there any case where someone would want a template that includes a postfix? If so, maybe a new option is needed? Otherwise, I can rename to prefix - in this case, would it still make sense for the prefix to include the string {{PROMPT}}?

philpax commented 1 year ago

Great work! Looking forward to merging this soon 🚀

For the message_prompt, is there any case where someone would want a template that includes a postfix? If so, maybe a new option is needed? Otherwise, I can rename to prefix - in this case, would it still make sense for the prefix to include the string {{PROMPT}}?

I don't think the logic would work if it was postfix anyway - we should make it clear that you need to pass in a prefix. I'd say you can leave out the {{PROMPT}} in that case, because it should always be implied that the prompt will be suffixed.

averypelle commented 1 year ago

Okay @philpax I have updated the CLI to take a message_prompt_prefix instead. I also tried running locally with several models and it is working as expected for a multi-prompt chat.

philpax commented 1 year ago

Heya! ...apologies for hijacking the PR. I went to test it and all of your changes worked as expected, but I realised that there were quite a few latent bugs with the stuff not covered by your PR and that the whole chat/REPL logic just wasn't working how I wanted it to work. I ended up revising way more than I intended 😅

The upshot is that it should now work consistently, and there shouldn't be any surprise discrepancies between REPL and chat mode. Sorry once again for the complete hijack 😭

Feel free to ask about any of the changes I made! Most of them were unrelated to the code you introduced (I mostly addressed issues that were already present before your changes), but I'm happy to explain them nonetheless. You might be interested in the simplify message_prompt_prefix commit, in which I replaced the if-lets with a match.

rustformers / llm

allow chat to halt new token generation on `stop_sequence` #364