Fix for silent llama - Githubissues

dibrale commented 1 year ago

Fix for prioritization agent deleting priority list when returning empty output. Prompt engineering and parameter improvements to reduce number of empty responses from llama models. More debug output.

@jmtatsch - This should fix the issue you experienced with my prior PR.

dibrale commented 1 year ago

Attached is a rather lengthy example of the sort of output I get with this PR. More than anything, it illustrates the difficulties that my local model has staying on task and how these issues get propagated and/or handled internally with subsequent iteration. The propensity to produce empty output from time to time may be a context length issue, and I believe that further optimization in this regard is possible. Nonetheless, I feel that output has gotten much better. As ever, comments, criticisms and suggestions are greatly appreciated!

sample_output.txt

cornpo commented 1 year ago

This fixed my mlock issue. Current;y running for first time.

jmtatsch commented 1 year ago

@dibrale Thanks, that solves the issue for me. However there are a couple of changes that don't seem necessary for me. Maybe you can explain a bit why you chose to do them. Why did you disable memlock? Why did you halve CTX_MAX = 1024? Why did you reduce max_tokens=200?

dibrale commented 1 year ago

@jmtatsch - To answer your questions:

1. mlock was not working properly for me anyway, and I believe it causes crashes on some machines. It is disabled by default in llama_cpp, and keeping it disabled in my PR is probably what fixed the issue @cornpo was having. Perhaps enabling mlock can be a .env option at some point. 2. At least some of the empty output problems seem to have been caused by prompt input exceeding what a llama model can work with. Reducing CTX_MAX to a fixed value is a quick and dirty way to fit more new input into the prompt without causing this issue. Prompt allowance utilization can probably be improved by some sort of dynamic adjustment to context length later on. 3. A modest max_tokens keeps the potential amount of context the LLM can add in any one step to a manageable amount. Since context_agent returns the top 5 results for the execution agent to work with, keeping each of these results to a smaller size prevents the execution agent from ingesting an overwhelmingly large prompt.

I hope this clarifies my reasoning a bit. Please let me know if you have any other questions or suggestions.

francip commented 1 year ago

Can we change the CTX_MAX and max_tokens only for Llama? Also, there are bunch of changes to the prompts, which I'd like to split into a separate PR, if possible, as I am also about to merge another prompt refactor/change from BabyBeeAGI.

dibrale commented 1 year ago

Can we change the CTX_MAX and max_tokens only for Llama? Also, there are bunch of changes to the prompts, which I'd like to split into a separate PR, if possible, as I am also about to merge another prompt refactor/change from BabyBeeAGI.

CTX_MAX only gets set for Llama models anyway, unless I am mistaken. Its assignment occurs behind a branch that checks for llama, and this behavior does not change in my PR. max_tokens is likewise only set to 200 when calling a llama model. Please correct me if I'm wrong on either of these points, and I will adjust the code if required. I'll work on reverting the prompt changes in the meantime.

dibrale commented 1 year ago

@francip - I've reverted the prompt changes, so hopefully this PR meets your needs. I'll submit prompt changes for separate consideration.

dibrale commented 1 year ago

Prompt changes are now in a separate PR as requested. Please let me know if there are any further issues.

yoheinakajima / babyagi

Fix for silent llama #279