Open rossdanlm opened 10 months ago
Just so I understand the impact -- it sounds like our llama extension doesn't support multi-turn messages (i.e. chats)?
I am not sure if the code below is the equivalent mapping to a ChatCompletionRequestMessage
object:
"
CONTEXT:
Q: {q_1}
A: {a_1}
...
Q: {q_n}
A: {a_n}
QUESTION:
{resolved_prompt}
"
If yes, then we support multi-turn. If no then there could we weird things going on. In either case, we're not saving each message as it's own individual prompt. Instead we're storing the q&a into the model parser object itself (under the self.qa
field), not into the prompts
I'm not quite sure, I haven't been able to run the cookbook myself to test becasue of https://github.com/lastmile-ai/aiconfig/issues/606
It's a bit hard for me to debug without being to run, but by looking through the code, this is my understanding:
Just a note for #2 is that in our other chat-based model parsers we use completion params instead as well as parse a response object like ChatCompletionResponseMessage
: https://github.com/abetlen/llama-cpp-python/blob/f952d45c2cd0ccb63b117130c1b1bf4897987e4c/llama_cpp/llama_types.py#L57-L75 which LLama also accepts:
Also I notice that we seem to do add individual prompts for each message for the typescript implementation (https://github.com/lastmile-ai/aiconfig/blob/v1.1.8/extensions/llama/typescript/llama.ts#L131-L154), so this seems like it might only apply to python? Will sync with @jonathanlastmileai on this later
See comments in https://github.com/lastmile-ai/aiconfig/pull/605#discussion_r1436703044
Right now we'll only be storing the last message from response instead of the response (if there are multiple texts returned)