jupyterlab / jupyter-ai

A generative AI extension for JupyterLab
https://jupyter-ai.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
3.12k stars 306 forks source link

Allow for infinite/compressed chat history #970

Closed krassowski closed 4 days ago

krassowski commented 2 weeks ago

Problem

Previously the chat kept only two messages and that was hard-coded; with https://github.com/jupyterlab/jupyter-ai/pull/943 in we now have a AiExtension.default_max_chat_history setting which is great as it allows to increase the number from two to say 10. However context for a longer history will still be lost altogether. It is impossible to set memory to infinite, even if the model caches tokens.

Proposed Solution

Additional context

Langchain has a dedicated example on how to implement summarization for chat history here:

https://python.langchain.com/v0.2/docs/how_to/chatbots_memory/#summary-memory

krassowski commented 2 weeks ago

To spell it out, default_max_chat_history cannot be set to infinite as of today because it is defined as an Integer (and math.inf is a float) and even if it was a Float it would later fail on BoundedChatHistory expecting an int. I think the solution here could be to treat None as a special value. Thoughts?

dlqqq commented 2 weeks ago

I think the solution here could be to treat None as a special value. Thoughts?

That makes sense to me. 👍

srdas commented 2 weeks ago
  1. Setting to None is a good idea.
  2. One thing to keep in mind is that with very long chat memory, you may exceed the LLM's input context window, in which case the memory will be truncated, keeping the earliest chat exchanges, not the most recent ones, so it will need to be handled if we want the most recent ones (without the summarization idea).
  3. For the idea to summarize chunks of the history rather than carrying all of it (to reduce number of tokens and save costs), we need to extend the default setting instructions at startup (jupyter lab --AiExtension.default_max_chat_history=2) to also include a parameter for size of memory trail to summarize (or is everything beyind the default_max_chat_history?).
krassowski commented 2 weeks ago

For the idea to summarize chunks of the history rather than carrying all of it [...] we need to extend the default setting instructions at startup [...] to also include a parameter for size of memory trail to summarize

Yes, that was my thinking too. Because there are couple of ways to implement compression, e.g.:

And each of these would have different set of parameters. I do not want to put too much compression logic into jupyter-ai to avoid making it hard to maintain, maybe let's have some simple default and allow swapping it out for something more advanced in extensions?