Closed averypelle closed 1 year ago
Great work! I was actually thinking about bringing that logic out so that I could use it for llmcord
.
Do you think you'd be able to move the inference_callback
to llm-base
, name it something like conversation_inference_callback
, and update both llm-cli
and the vicuna-chat
example to use it? (You might need to parameterise over print_token
)
That would allow for this logic to be used across both, as well as elsewhere (the aforementioned llmcord
).
I'd suggest passing the stop sequence in from the CLI (i.e. maybe replace message_prompt
with message_prompt_prefix
and use that as the stop sequence.)
i didn't touch REPL mode - is the desired behavior to continue generating tokens in that mode? otherwise happy to change
Yup, that's for a back and forth where no state is preserved and the model can produce as much output as it wants. I'd suggest splitting the code paths so that they use entirely different inference callbacks - they share the readline, but their inference behaviour is pretty different.
Great work once again, let me know if you need a hand with any of this! 🙂
Thanks @philpax! Just pushed an update where I moved the function to llm-base
. For the message_prompt
, is there any case where someone would want a template that includes a postfix? If so, maybe a new option is needed? Otherwise, I can rename to prefix - in this case, would it still make sense for the prefix to include the string {{PROMPT}}
?
Great work! Looking forward to merging this soon 🚀
For the
message_prompt
, is there any case where someone would want a template that includes a postfix? If so, maybe a new option is needed? Otherwise, I can rename to prefix - in this case, would it still make sense for the prefix to include the string{{PROMPT}}
?
I don't think the logic would work if it was postfix anyway - we should make it clear that you need to pass in a prefix. I'd say you can leave out the {{PROMPT}}
in that case, because it should always be implied that the prompt will be suffixed.
Okay @philpax I have updated the CLI to take a message_prompt_prefix
instead. I also tried running locally with several models and it is working as expected for a multi-prompt chat.
Heya! ...apologies for hijacking the PR. I went to test it and all of your changes worked as expected, but I realised that there were quite a few latent bugs with the stuff not covered by your PR and that the whole chat/REPL logic just wasn't working how I wanted it to work. I ended up revising way more than I intended 😅
The upshot is that it should now work consistently, and there shouldn't be any surprise discrepancies between REPL and chat mode. Sorry once again for the complete hijack ðŸ˜
Feel free to ask about any of the changes I made! Most of them were unrelated to the code you introduced (I mostly addressed issues that were already present before your changes), but I'm happy to explain them nonetheless. You might be interested in the simplify message_prompt_prefix
commit, in which I replaced the if-let
s with a match
.
Closes https://github.com/rustformers/llm/issues/363
Stop token generation after reaching a specified
stop_sequence
in chat modeI am still new to rust so please let me know how I can improve my code!