Open simonw opened 6 months ago
Relevant code: https://github.com/nomic-ai/gpt4all/blob/2025d2d15b8571643241d145085d7cc6cd1d331b/gpt4all-bindings/python/gpt4all/gpt4all.py#L534-L603
My best guess is that self.model.prompt_model_streaming
has its own internal state, which is why my attempts to manipulate the state in the outer layer are having no effect.
Maybe the previous tokens
are accumulated in this low-level tokens
C array, and that's the thing that isn't updated if you add stuff to _history
? https://github.com/nomic-ai/gpt4all/blob/2025d2d15b8571643241d145085d7cc6cd1d331b/gpt4all-bindings/python/gpt4all/_pyllmodel.py#L53-L70
I asked for sothing similar today #2358 I tried to clear() current_chat_session for a new chat without leaving the context manager but that is also being ignored.
My simple GUI: https://github.com/woheller69/gpt4all-TK-CHAT
Aha: spotted this which happens only if self._history is None
:
That must be the mechanism that resets the internal token state.
More details on why I need this here:
My LLM tool works by logging messages and responses to a SQLite database, so you can do things like this:
llm "three names for a pet pelican"
# Outputs three names
llm -c "2 more" # -c means continue previous thread
# Outputs two more names
In order to get GPT4All working correctly as a plugin for my tool I need the ability to instantiate a new model and then start a chat session with the previous context populated from my persisted SQLite version - but I can't figure out a way to do that.
You might use llama-cpp-agent (https://github.com/Maximilian-Winter/llama-cpp-agent) and llama-cpp-python instead of gpt4all. I am also experimenting with it: https://github.com/woheller69/LLAMA_TK_CHAT/blob/main/LLAMA_TK_GUI.py
There you can do things like: self.llama_cpp_agent.chat_history.get_message_store().add_assistant_message(...)
The way we accomplished support for initial chat session messages in the node bindings is using fake_reply . But I think its not exposed/documented as a user facing parameter in the py bindings. It looks intentional, but idk about the exact reasoning. May wanna expose it, or add some other way to allow for that "conversation restore" functionality that encapsulates fake_reply
. I believe it was initially added to allow for similar functionality in gpt4all-chat.
There might also be an alternative way to hack around it using the prompt template parameter + special=true and sending in the whole turns "pre-templated", including assistant response with n_predict=0.
Hi is there any progress on this? I need exactly this feature :D
Feature Request
I'd like to be able to start a new
chat_session()
but then populate its history from my own recorded logs, rather than having to use that context manager for an entire chat. Basically I want to do this:This looks like it should work - I added some debug code and the
output_collector
used inside the model looks like this just before the prompt is executed:But the model says things like:
So clearly the trick of adding things to
_history
directly like that doesn't work!I'd love it if there was an official, documented way to do this. I need it for my https://github.com/simonw/llm-gpt4all/ project.