Issue with Memory Enhancement Tool with Ollama models

I really like the concept of what you have built.

I'm reaching out to report an issue I've encountered while using Memory Enhancement Tool (MET). I've been trying to integrate MET with my local OLLAMA models, but unfortunately, it's not working as expected.

Specifically, I'm experiencing difficulties getting MET to function correctly with LLMs like llama3.1-8b. Despite reviewing the documentation and setup instructions for MET, I was unable to resolve the issue.

Could you please provide guidance on the following:

Are there any known compatibility issues between MET and local OLLAMA models?
Is there a specific model format or loading mechanism that MET requires?
Are there any LLM version limitations or restrictions that I should be aware of?

I would greatly appreciate any information or assistance you can provide to help me resolve this issue.

Thanks

Hi Thanks for your feedback! heading to your questions:

as I tested it with gpt-4o-2024 (the last released one) model, which is capable of function calling , it does good at handling complicated orders, like adding memory and detecting whether user wants to add or is it its own that decides to add those memories ; although still needs some enhancements; AND local medium/small LLMs like llama3.2:latest doesn't handle this function (overall,complex function calling) properly. So , today I will try to release two commits: 1)would be a simple one ONLY for llama models(simple memory adding) 2)an update for the gpt4o MET, which is an update where memories could be downloaded. I'm so sorry that time limitation doesn't let me to push upgrades everyday, I try to do so
no; The memories are saved in a folder called memories_jsons in the running folder (root folder of) open webui docker container (/app/backend i think); and i haven't tested this MET tool on installations like pip or etc.
As I said, the MET tool works best with models that are better capable of function calling; i will release a simple tool(actually the same as this, but much simpler for the LLM to work with) , that would be for tiny LLMs like llama3.2:latest and llama3.1 .

The problem is i just wanted to complete the functionality of the tool and I hadn't enough time to also complete the documentation. Sorry for that and I will do this very task today.

Thanks again for leaving a feedback!

EDIT:

I think would be a better idea to put effort on the memories table @dnl13 mentioned instead of creating another one for llama models; i think the issue is function calling of the llama3.2 , which since it is a small model, would be normal that function calling lacks some powers!
if you want to use the old versions , you can clone the repo and e.g use the commit "5309320e41c706fb95e9a2a65c126353553b4fb7" which has the initial state of the tool and supposed to be lightweight for the LLM! I know the prompts are awful , sorry...

Hey everyone, Just sharing some thoughts on this approach.

I’ve been testing this with LLama3.2, but I’m not entirely sure what you mean by MET, @smcnaught1. Are you referring to the usual Personalized Memory Manager?

I really appreciate @mhioi's concept of enhancing the memory functionality. However, instead of writing files on the client side, I think it might be even more effective to explore the built-in Memories, MemoriesTable, and Files classes further to push them to the cache. In my view, a good approach could be to send these to a pipeline server and bind the memories to a RAG (Retrieval-Augmented Generation) pipeline instead of using a tool. I do understand, though, that tools are easier to implement, while pipelines require an external server. Ideally, memories should be user-based and optional. If enabled, they should be added to the LLM context in a compressed form to minimize token usage or be integrated into the RAG pipeline.

In my early tests using Memories and MemoriesTable, I noticed that tools seem to be more of a model "recommendation" than a "mandatory option." The model decides when to read/write/update/delete from memory. However, I believe reading from memory should be mandatory. Also, due to limitations in the UI, such as confirming the deletion of a memory, it would be helpful if we could interact with memories directly within the chat window.

For LLama3.2 and LLava, it seems that the function prompts need to be more precise.

For example, when recalling from memory, the model often ignores it with the default prompt that @mhioi used.

In my case, the prompt needed to look more like this:


...
    async def recall_memories(
        self, __user__: dict, __event_emitter__: Callable[[dict], Any] = None
    ) -> str:
        """
        Retrieve all stored memories from the user's memory vault and provide them to the user. 
        Be accurate and precise. Do not add any additional information. Always use the function to access memory or memories. 
        If the user asks about what is currently stored, only return the exact details from the function. Do not invent or omit any information.

        :return: A numeric list of all memories. You MUST present the memories to the user as text. It is important that all memories are displayed without omissions. Please show each memory entry in full!
        """

        # get user id
        self.user_id = __user__.get("id")
...

This doesn’t fully prevent the model from hallucinating memories that were never written to the memory "vault" (whether that’s JSON files, the built-in memory class, etc.). Also, retrieving memories is often hallucinated by the model. However, I observed that when the memories are pushed to the open-webui Memory, they are presented to the model, so it remembers, in a new chat window, what was previously stored. (I’m guessing they’re added to the context somewhere). When using @mhioi's variation, you need to ensure memory reading is added to the system prompt or the initial chat message.

I’m still using an older version, @mhioi, so I haven’t been able to keep up with your rapid developments 😅.

Additionally, when a memory is already added to the personalized built-in memory, sometimes the memories are duplicated in this tool, and it becomes tricky to delete them properly. Even after deleting the JSON object, the personalized memory remains. It might be helpful to use different wording/naming to keep both "memories" separate when still writing to JSON files.

These are just some of my observations and conclusions. Keep up the great work, and thanks for all the effort you’re putting into this!

Hey @dnl13

I’ve been testing this with LLama3.2, but I’m not entirely sure what you mean by MET, @smcnaught1. Are you referring to the usual Personalized Memory Manager? I think what @smcnaught1 is talking about the gpt4 memory mimic tool;which is the reason I'm considering changing the name to MET or MEET (doesn't matter by now,we're focusing to more important things)...

I really appreciate @mhioi's concept of enhancing the memory functionality. However, instead of writing files on the client side, I think it might be even more effective to explore the built-in Memories, MemoriesTable, and Files classes further to push them to the cache. In my view, a good approach could be to send these to a pipeline server and bind the memories to a RAG (Retrieval-Augmented Generation) pipeline instead of using a tool. I do understand, though, that tools are easier to implement, while pipelines require an external server. Ideally, memories should be user-based and optional. If enabled, they should be added to the LLM context in a compressed form to minimize token usage or be integrated into the RAG pipeline.

Thanks for your helps ! I really appreciate your observations!

Yes! There are much more enhancements for this tool ! as you mentioned, the crucial one is for merging it with built in memories. However I don't really know how to do that ( by now I mean).

However, I really wanna know where is the memories functionality in open webui? cause I have dev branch and still couldn't see any memories section in any place of the UI.Would be thankful if you guide me about that so we can implement this into MemoriesTable.

And for the pipelines concept, I never had a chance before to use pipelines. so actually don't know whether how to use it.

In my early tests using Memories and MemoriesTable, I noticed that tools seem to be more of a model "recommendation" than a "mandatory option." The model decides when to read/write/update/delete from memory. However, I believe reading from memory should be mandatory. Aleso, due to limitations in the UI, such as confirming the deletion of a memory, it would be helpful if we could interact with memories directly within the chat window.

yeah! that's why I used tools for that concept, cause if the LLM would be enough smart, it would decide if there is a need to add user's queryies to memories or not; if so, in how many indexes would it be? in what tags? etc...

For LLama3.2 and LLava, it seems that the function prompts need to be more precise.

😅yep; I was too hinged and excited to enriching the llm's tools , so haven't focused on prompts. sorry

For example, when recalling from memory, the model often ignores it with the default prompt that @mhioi used.

In my case, the prompt needed to look more like this:
...
    async def recall_memories(
        self, __user__: dict, __event_emitter__: Callable[[dict], Any] = None
    ) -> str:
        """
        Retrieve all stored memories from the user's memory vault and provide them to the user. 
        Be accurate and precise. Do not add any additional information. Always use the function to access memory or memories. 
        If the user asks about what is currently stored, only return the exact details from the function. Do not invent or omit any information.

        :return: A numeric list of all memories. You MUST present the memories to the user as text. It is important that all memories are displayed without omissions. Please show each memory entry in full!
        """

        # get user id
        self.user_id = __user__.get("id")
...
may I use your prompt and commit a change on prompts in the future? If you mind ! or we can use another LLMs like chatgpt to enrich full prompts!(future versions actually...)

This doesn’t fully prevent the model from hallucinating memories that were never written to the memory "vault" (whether that’s JSON files, the built-in memory class, etc.). Also, retrieving memories is often hallucinated by the model. However, I observed that when the memories are pushed to the open-webui Memory, they are presented to the model, so it remembers, in a new chat window, what was previously stored. (I’m guessing they’re added to the context somewhere). When using @mhioi's variation, you need to ensure memory reading is added to the system prompt or the initial chat message.

I’m still using an older version, @mhioi, so I haven’t been able to keep up with your rapid developments 😅.

😂 just published a way to download any memory files user created ! hope you would give feedback to enhance or remove that feature! see, am toooo excited 😂

Additionally, when a memory is already added to the personalized built-in memory, sometimes the memories are duplicated in this tool, and it becomes tricky to delete them properly. Even after deleting the JSON object, the personalized memory remains. It might be helpful to use different wording/naming to keep both "memories" separate when still writing to JSON files.

yeah! if the LLM would be smart enough, it can merge duplicates !

These are just some of my observations and conclusions. Keep up the great work, and thanks for all the effort you’re putting into this!

Thanks ALOT! would be considering these in the future !

The most important thing is the MemoriesTable ! I would appreciate if you let me know how to use this PR as my open webui doesn't have that...

Hey @mhioi and @dnl13

I was refering MET as the GPT4 function that @mhioi created.

I recommend leveraging OpenWebUI API, if possible, to store and retrieve memories. Here's a high-level overview of how you can approach this:

Store Memories : When the chatbot recalls a memory, send a GET request to POST /memories/add with the user ID and key phrase from the conversation as JSON data. This will create or update a memory entry in OpenWebUI.
Retrieve Memories for Recalls : For each recall action (e.g., generating a response), use the GET /memories/ endpoint to fetch all memories associated with the current user. Then, filter and sort these memories based on relevance and importance.
Update Memory Entries : When updating or deleting memories, you can use the corresponding API endpoints (POST /memories/update and DELETE * /memories/delete/user, respectively).

import` requests

# Initialize API endpoint URL
base_url = "https://api.openwebui.com"

def store_memory(user_id, key_phrase):
    # Send a POST request to add a new memory entry
    url = f"{base_url}/memories/add"
    data = {
        "user_id": user_id,
        "key_phrase": key_phrase,
        "content": ""
    }

    response = requests.post(url, json=data)
    if response.status_code == 200:
        print(f"Memory added successfully: {response.json()['id']}")
    else:
        print(f"Failed to add memory: {response.text}")

def retrieve_memories(user_id):
    # Send a GET request to fetch memories for recalls
    url = f"{base_url}/memories/"
    params = {"user_id": user_id}

    response = requests.get(url, params=params)
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Failed to retrieve memories: {response.text}")

I think there would be some issues with the constant API calling, and we would want to look at caching capabilities.

Just my thoughts at the moment, I have not had time to dig into this much further but I am looking to add automated memory into my OpenWebUI instance.

Thanks for the work!

yeah! That sounds a good idea too! the thing is can we implement memories both into openwbui itself(like the memoriestable @dnl13 said before) and into the API itself by using/developing one base code?

I think that's possible if we :

1st use memories class inside of the current code
2nd extend the results into API itself

@mhioi fyi The Memories class is already writing to the WebUI database, so perhaps using files might not be necessary anymore. However, we’ll need to be careful about how we can extend the root Memory class to support more diverse types of memories.

Additionally: That’s why we should seriously consider generating it as a pipeline, so that it can later be integrated into an existing RAG model. An API endpoint for the memories could be easily implemented using the existing MemoryTables. MHO

mhioi / open-webui-stuff

Issue with Memory Enhancement Tool with Ollama models #2