lastmile-ai / aiconfig

AIConfig is a config-based framework to build generative AI applications.
https://aiconfig.lastmileai.dev
MIT License
948 stars 79 forks source link

Refactor LLama extension + cookbook to map chat messages --> Prompt 1-1 #630

Open rossdanlm opened 10 months ago

rossdanlm commented 10 months ago

See comments in https://github.com/lastmile-ai/aiconfig/pull/605#discussion_r1436703044

Right now we'll only be storing the last message from response instead of the response (if there are multiple texts returned)

saqadri commented 10 months ago

Just so I understand the impact -- it sounds like our llama extension doesn't support multi-turn messages (i.e. chats)?

rossdanlm commented 10 months ago

TLDR

I am not sure if the code below is the equivalent mapping to a ChatCompletionRequestMessage object:

"
CONTEXT:
Q: {q_1}
A: {a_1}
...
Q: {q_n}
A: {a_n}

QUESTION:
{resolved_prompt}
"

If yes, then we support multi-turn. If no then there could we weird things going on. In either case, we're not saving each message as it's own individual prompt. Instead we're storing the q&a into the model parser object itself (under the self.qa field), not into the prompts

Details

I'm not quite sure, I haven't been able to run the cookbook myself to test becasue of https://github.com/lastmile-ai/aiconfig/issues/606

It's a bit hard for me to debug without being to run, but by looking through the code, this is my understanding:

  1. We build chat history in a string format: https://github.com/lastmile-ai/aiconfig/blob/main/extensions/llama/python/llama.py#L41-L46
  2. This gets passed into the Llama call: https://github.com/lastmile-ai/aiconfig/blob/main/extensions/llama/python/llama.py#L88. Perhaps this is a short-hand for representing messages object, but I couldn't find example of this from the API docs: https://docs.llama-api.com/api-reference/endpoint/create. Jonathan probalby knows more details since he built it.

Just a note for #2 is that in our other chat-based model parsers we use completion params instead as well as parse a response object like ChatCompletionResponseMessage: https://github.com/abetlen/llama-cpp-python/blob/f952d45c2cd0ccb63b117130c1b1bf4897987e4c/llama_cpp/llama_types.py#L57-L75 which LLama also accepts:

rossdanlm commented 10 months ago

Also I notice that we seem to do add individual prompts for each message for the typescript implementation (https://github.com/lastmile-ai/aiconfig/blob/v1.1.8/extensions/llama/typescript/llama.ts#L131-L154), so this seems like it might only apply to python? Will sync with @jonathanlastmileai on this later