nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
https://nomic.ai/gpt4all
MIT License
70.68k stars 7.7k forks source link

[Feature] Ability to populate previous chat history when using chat_session() #2360

Open simonw opened 6 months ago

simonw commented 6 months ago

Feature Request

I'd like to be able to start a new chat_session() but then populate its history from my own recorded logs, rather than having to use that context manager for an entire chat. Basically I want to do this:

with model.chat_session():
    model._history.append({"role": "user", "content": "2 names for a pet pelican"})
    model._history.append({"role": "assistant", "content": "Charlie, and Polly"})
    print(model.generate("3 more"))

This looks like it should work - I added some debug code and the output_collector used inside the model looks like this just before the prompt is executed:

[
  {'role': 'system', 'content': ''},
  {'role': 'user', 'content': '2 names for a pet pelican'},
  {'role': 'assistant', 'content': 'Charlie, and Polly'},
  {'role': 'user', 'content': '3 more'},
  {'role': 'assistant', 'content': ''}
]

But the model says things like:

It seems like your request is incomplete. Could you please provide additional information or clarify what "3 more" refers to? If it's related to a specific task, quantity, sequence, or something else numerical in nature, I would be happy to assist further!

So clearly the trick of adding things to _history directly like that doesn't work!

I'd love it if there was an official, documented way to do this. I need it for my https://github.com/simonw/llm-gpt4all/ project.

simonw commented 6 months ago

Relevant code: https://github.com/nomic-ai/gpt4all/blob/2025d2d15b8571643241d145085d7cc6cd1d331b/gpt4all-bindings/python/gpt4all/gpt4all.py#L534-L603

My best guess is that self.model.prompt_model_streaming has its own internal state, which is why my attempts to manipulate the state in the outer layer are having no effect.

simonw commented 6 months ago

Maybe the previous tokens are accumulated in this low-level tokens C array, and that's the thing that isn't updated if you add stuff to _history? https://github.com/nomic-ai/gpt4all/blob/2025d2d15b8571643241d145085d7cc6cd1d331b/gpt4all-bindings/python/gpt4all/_pyllmodel.py#L53-L70

woheller69 commented 6 months ago

I asked for sothing similar today #2358 I tried to clear() current_chat_session for a new chat without leaving the context manager but that is also being ignored.

My simple GUI: https://github.com/woheller69/gpt4all-TK-CHAT

simonw commented 6 months ago

Aha: spotted this which happens only if self._history is None:

https://github.com/nomic-ai/gpt4all/blob/2025d2d15b8571643241d145085d7cc6cd1d331b/gpt4all-bindings/python/gpt4all/gpt4all.py#L562

That must be the mechanism that resets the internal token state.

simonw commented 6 months ago

More details on why I need this here:

My LLM tool works by logging messages and responses to a SQLite database, so you can do things like this:

llm "three names for a pet pelican"
# Outputs three names
llm -c "2 more" # -c means continue previous thread
# Outputs two more names

In order to get GPT4All working correctly as a plugin for my tool I need the ability to instantiate a new model and then start a chat session with the previous context populated from my persisted SQLite version - but I can't figure out a way to do that.

woheller69 commented 6 months ago

You might use llama-cpp-agent (https://github.com/Maximilian-Winter/llama-cpp-agent) and llama-cpp-python instead of gpt4all. I am also experimenting with it: https://github.com/woheller69/LLAMA_TK_CHAT/blob/main/LLAMA_TK_GUI.py

There you can do things like: self.llama_cpp_agent.chat_history.get_message_store().add_assistant_message(...)

iimez commented 5 months ago

The way we accomplished support for initial chat session messages in the node bindings is using fake_reply . But I think its not exposed/documented as a user facing parameter in the py bindings. It looks intentional, but idk about the exact reasoning. May wanna expose it, or add some other way to allow for that "conversation restore" functionality that encapsulates fake_reply. I believe it was initially added to allow for similar functionality in gpt4all-chat.

There might also be an alternative way to hack around it using the prompt template parameter + special=true and sending in the whole turns "pre-templated", including assistant response with n_predict=0.

riebers-m commented 4 months ago

Hi is there any progress on this? I need exactly this feature :D