kingjulio8238 / Memary

Making Agents Reliable In Production.
https://www.memarylabs.com
MIT License
1.15k stars 82 forks source link

Occasional final response outputting entities #35

Closed kevinl424 closed 2 weeks ago

kevinl424 commented 1 month ago
kingjulio8238 commented 1 month ago

Could this bug potentially arise from the order in which we pass the top K entities into the model? Not including it first or last could potentially help with this

kevinl424 commented 1 month ago

Update Using gpt-3.5-preview and any more advanced models resolves the issue and the final response works quite well. Running Llama 3 8B outputs initial response fine, but when fed all the persona and knowledge entity store information, it tends to get confused and output the wrong thing.

Seems to be largely due to model ability. A potential solution can be some sort of method to detect what ollama models the user has pulled, and be able to select from there. This allows users to have much larger models (llama 3 70B, etc.) to output valid responses more consistently while still running locally. Should also probably include a note that llama 3 8B does not always work with the final response.

kevinl424 commented 1 month ago

Further testing using requests library to get response (allows isolation of number tokens used in prompt) suggest that the context is not overflowing. Would like to try 70B but as noted in Ollama repo, you need "32 GB to run the 33B models" and more for 70B.