mlx-chat / mlx-chat-app

Chat with MLX is a high-performance macOS application that connects your local documents to a personalized large language model (LLM).
MIT License
161 stars 9 forks source link

Save and Utilize Chat History #36

Closed namp closed 6 months ago

namp commented 6 months ago

The current implementation lacks chat history, that is, session messages and responses are not stored anywhere in some form of cache to append them every time the user sends a message. This would make conversation natural and focused to the current context.

For example in streamlit this can be done easily as follows:

if "messages" not in st.session_state:
      st.session_state.messages = []

for message in st.session_state.messages:
     with st.chat_message(message["role"]):
        st.markdown(message["content"])

if prompt := st.chat_input("What is up?"):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    with st.chat_message("assistant"):
        stream = client.chat.completions.create(
            model=st.session_state["model"],
            messages=[
                {"role": m["role"], "content": m["content"]}
                for m in st.session_state.messages
            ],
            stream=True,
        )
        response = st.write_stream(stream)
    st.session_state.messages.append({"role": "assistant", "content": response})
stockeh commented 6 months ago

@namp, chat history is indeed cached in the app, c.f. the current implementation in Chat.tsx. Specifically,

      const newHistory = [
        ...chatHistory,
        { role: 'user' as const, content: message },
      ];
      setChatHistory(newHistory);
      // ...
        body: JSON.stringify({
          messages: selectedDirectory
            ? [{ role: 'user', content: message }]
            : newHistory.filter((chat) => chat.role !== 'system'),
      // ...
      setChatHistory([
        ...newHistory,
        { role: 'assistant', content: assistantResponse },
      ]);

In converse mode we send in request body to the server the history, e.g.,

messages = [ 
  {'role': 'user', 'content': 'hi'}, 
  {'role': 'assistant', 'content': "Hello! I'm here to assist you. How can I help?"}, 
  {'role': 'user', 'content': 'what color is the sky?'}, 
  {'role': 'assistant', 'content': 'The sky is usually blue during the day and black at night.'}, 
  ...
]

In assist mode we only send the most recent user message to (a) search for relevant documents pertaining only to that message, and (b) generate a context-free response that depends only on document context and the singular prompt.

Maintaining (a) is simple as it's already implemented using the maximal marginal relevance of the most recent user message, i.e., messages[-1]['content']. However, changing (b) to use chat history may require some clever prompt engineering. It'd be interesting to experiment with and advance and so we invite collaboration, but this issue is not as relevant and I'll be closing it for now.

namp commented 6 months ago

Interesting to know about this. But how do you select modes? Is there any keybinding?

Thanks

stockeh commented 6 months ago

@namp, definitely! By default, the app is in converse mode (conversational context) and only switches to assist mode (document-specific context) after a directory has been indexed. One can return to converse mode by removing the indexed directory (x to the left of directory name in the bottom right). The figure illustrates with a red line when in the conversation the modes were switched.

Screenshot 2024-03-12 at 2 48 20 PM