mem0ai / mem0

The Memory layer for your AI apps
https://mem0.ai
Apache License 2.0
23.11k stars 2.14k forks source link

Graph base relation saving works with extreme high token cost #2066

Open ChenMa2017 opened 5 hours ago

ChenMa2017 commented 5 hours ago

🚀 The feature

The relationship storage mechanism based on graphs consumes a significant amount of tokens for large models. It is necessary for large models to analyze entities and their relationships more efficiently to reduce both time consumption and token usage.

Motivation, pitch

The first time I tried:

Using ChatGPT-4 to extract photo descriptions. Saving the textual descriptions of the photos into Mem0. However, I found that the cost of saving one photo description was 15 times higher than the cost of generating the description itself. Moreover, the saving process was extremely slow—saving 62 descriptions took over an hour.

The Chaggpt token cost consumption is shown in the figure below。

openai_money_cost_cpmparison_1121

The second time I tried:

Using ChatGPT-4 to extract photo descriptions, e.g., image_name.jpg. Saving the textual descriptions of the photos into files, e.g., image_name.json. Storing the correspondence between [image_name.jpg, image_name.json] in Mem0. However, after extensive testing, I discovered that Mem0 does not actually record the file correspondence relationships.

Here is a sample dialogue example:

User: The file './result/test_polly.json' is the json/JSON file has the text description of the './image/test_polly.jpg'.

Assistant: If you have a file at './result/test_polly.json', it likely contains the text description for the image of Polly, Jim Green's parrot, which is stored at './image/test_polly.jpg'. If you need any assistance regarding this file, feel free to ask!

User: do I have any json/JSON file?

Assistant: I don't have information about any specific JSON files you may have saved. If you recall storing any JSON files, you might want to check your project directories or other locations where you typically save such files. Let me know if there's anything else I can assist you with!

You: the file './result/test_polly.json' is the json/JSON file has the text description of the image './image/test_polly.jpg'.

Assistant: It sounds like you have a JSON file located at ./result/test_polly.json which contains the text description of the image of Polly the parrot, located at ./image/test_polly.jpg. If you need any assistance with these files, feel free to let me know!

You: do you know waht is the file './result/test_polly.json' is?

Assistant: I don't have information on a file named './result/test_polly.json'. However, I do know about a photo of Polly, Jim Green's parrot, which is saved at './image/test_polly.jpg'. If the .json file is related to Polly, it might contain metadata or some form of structured data related to the photo or Polly, but I can't provide specific details without further context.

ChenMa2017 commented 4 hours ago

I carefully reviewed the code for Mem0's memory mechanism.

During the search phase of memory:

Excluding the embedding phase, the memory mechanism requires seven calls to llm.generate_response(). Are there additional instances of llm.generate_response() being called that I may have missed?

This results in significant token consumption, which might explain the token usage issue mentioned earlier. Is there a more efficient memory mechanism available?