Get Auto-GPT working. It's blocked on the following 2 items

Adding local embedding: https://github.com/Significant-Gravitas/Auto-GPT/pull/2473
Adding support for different BASE_URL for OpenAi
- update: @DGdev91 made a PR for this: https://github.com/Significant-Gravitas/Auto-GPT/pull/2594

Embeddings aren't working quite right / very inconsistent between models, so having a built-in local embedding option is good.

For the second bullet point, we can technically work around and set the BASE_URL by modifying the code, but it would be nice to have it as an option / env variable so it's easier to get started.

I'll likely open a new PR against Auto-GPT for the second point and push to get that PR i linked above in

Just got AutoGPT working consistently from this embeddings fix https://github.com/keldenl/gpt-llama.cpp/commit/b3db39f2638ac284fd5ac5c33c01baf0f497833b

Still need to manually update the vector size in autogpt to 5120 or 4096 depending on the llama model (see https://huggingface.co/shalomma/llama-7b-embeddings#quantitative-analysis), but it works!

vicuna 13b, the best model i've tested so far, isn't doing a GREAT job generating actions that it can follow continuously (gets stuck in a loop)

I'll put up a fork of AutoGPT tmr with all the changes I made (openai BASE_URL and configurable vector dimensions) to make it easier for folks to replicate and test as well.

Thanks for your work!

i was exited to try it out and made those changes myself. i also made a pull request https://github.com/Significant-Gravitas/Auto-GPT/pull/2594/

Feel free to add your changes if i missed something

wow @DGdev91 that was EXACTLY what i was going to do.. thanks!! nothing to add for me

Re-opening this until we get the PR https://github.com/Significant-Gravitas/Auto-GPT/pull/2594 fully merged

🚀 🚀 AUTOGPT + GPT-LLAMA.CPP GUIDE IS NOW AVAILABLE: https://github.com/keldenl/gpt-llama.cpp/blob/master/docs/Auto-GPT-setup-guide.md

HUGE shoutout to @DGdev91 for putting up the PR to make this possible – hopefully the PR gets merged to master soon.

Here's a (very short) demo of it running on my M1 Mac: https://github.com/keldenl/gpt-llama.cpp/blob/master/docs/demos.md#Auto-GPT

@keldenl Hi, I tested gpt-llama.cpp's API using the following script:

curl --location --request POST 'http://localhost:443/v1/chat/completions' \
--header 'Authorization: Bearer /home/nero/code/llama.cpp/models/7B/ggml-model-q4_0.bin' \
--header 'Content-Type: application/json' \
--data-raw '{
   "model": "gpt-3.5-turbo",
   "messages": [
      {
         "role": "system",
         "content": "You are ChatGPT, a helpful assistant developed by OpenAI."
      },
      {
         "role": "user",
         "content": "How are you doing today?"
      }
   ]
}'

My model is original llama 7B mode and I succesfully got the response from localhost:443:

{"choices":[{"message":{"content":" Great! Thanks for asking :)\n"},"finish_reason":"stop","index":0}],"created":1682008784901,"id":"Zi7iQapUGOgjeVVB2oJtc","object":"chat.completion.chunk","usage":{"prompt_tokens":99,"completion_tokens":7,"total_tokens":106}}%

I also configured the Auto-GPT according to this guide: https://github.com/keldenl/gpt-llama.cpp/blob/master/docs/Auto-GPT-setup-guide.md. But this time when I run the python -m autogpt --debug, the server on localhost:443 can't return a response, the terminal is like this:

--REQUEST--
user: ''' You are a helpful assistant.
''', '''
{
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    },
    "thoughts":
    {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted
- list that conveys
- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    }
}
'''
Readable Stream: CLOSED

Is there something wrong?

@keldenl Hi, I tested gpt-llama.cpp's API using the following script:

curl --location --request POST 'http://localhost:443/v1/chat/completions' \
--header 'Authorization: Bearer /home/nero/code/llama.cpp/models/7B/ggml-model-q4_0.bin' \
--header 'Content-Type: application/json' \
--data-raw '{
   "model": "gpt-3.5-turbo",
   "messages": [
      {
         "role": "system",
         "content": "You are ChatGPT, a helpful assistant developed by OpenAI."
      },
      {
         "role": "user",
         "content": "How are you doing today?"
      }
   ]
}'

My model is original llama 7B mode and I succesfully got the response from localhost:443:

{"choices":[{"message":{"content":" Great! Thanks for asking :)\n"},"finish_reason":"stop","index":0}],"created":1682008784901,"id":"Zi7iQapUGOgjeVVB2oJtc","object":"chat.completion.chunk","usage":{"prompt_tokens":99,"completion_tokens":7,"total_tokens":106}}%

--REQUEST--
user: ''' You are a helpful assistant.
''', '''
{
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    },
    "thoughts":
    {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted
- list that conveys
- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    }
}
'''
Readable Stream: CLOSED

Is there something wrong?

I guess you already have done it, but let's check anyway.

The example uses vicuna 13b as model, so the settings on .env file are set like this

OPENAI_API_BASE_URL=http://localhost:443/v1
EMBED_DIM=5120
OPENAI_API_KEY=../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin

In your case, should be like this:

OPENAI_API_BASE_URL=http://localhost:443/v1
EMBED_DIM=4096
OPENAI_API_KEY=/home/nero/code/llama.cpp/models/7B/ggml-model-q4_0.bin

Have you already did that, or you just copied the values in the example?

Also, i'm not 100% sure the llama 7b model is good enough to handle AutoGPT requests correctly. I suggest you to try with vicuna. Even better the 13b model, wich is the one i and kendenl used for testing

yes, I already have done that, exactly the same as you mentioned. I’m quite sure the auto-gpt configuration is good. It seems like a problem with llama model itself or gpt-llama.cpp. You are right, I will try with Vicuna and see if same problem exists. Thanks！

发自我的iPhone

------------------ Original ------------------ From: DGdev91 @.> Date: Fri,Apr 21,2023 1:53 AM To: keldenl/gpt-llama.cpp @.> Cc: Yuhua Wei @.>, Comment @.> Subject: Re: [keldenl/gpt-llama.cpp] Add support for Auto-GPT (Issue #2)

@keldenl Hi, I tested gpt-llama.cpp's API using the following script: curl --location --request POST 'http://localhost:443/v1/chat/completions' \ --header 'Authorization: Bearer /home/nero/code/llama.cpp/models/7B/ggml-model-q4_0.bin' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "gpt-3.5-turbo", "messages": [ { "role": "system", "content": "You are ChatGPT, a helpful assistant developed by OpenAI." }, { "role": "user", "content": "How are you doing today?" } ] }'
My model is original llama 7B mode and I succesfully got the response from localhost:443: {"choices":[{"message":{"content":" Great! Thanks for asking :)\n"},"finish_reason":"stop","index":0}],"created":1682008784901,"id":"Zi7iQapUGOgjeVVB2oJtc","object":"chat.completion.chunk","usage":{"prompt_tokens":99,"completion_tokens":7,"total_tokens":106}}%
I also configured the Auto-GPT according to this guide: https://github.com/keldenl/gpt-llama.cpp/blob/master/docs/Auto-GPT-setup-guide.md. But this time when I run the python -m autogpt --debug, the server on localhost:443 can't return a response, the terminal is like this: --REQUEST-- user: ''' You are a helpful assistant. ''', ''' { "command": { "name": "command name", "args": { "arg name": "value" } }, "thoughts": { "text": "thought", "reasoning": "reasoning", "plan": "- short bulleted - list that conveys - long-term plan", "criticism": "constructive self-criticism", "speak": "thoughts summary to say to user" } } ''' Readable Stream: CLOSED
Is there something wrong?

I guess you already have done it, but let's check anyway.

The example uses vicuna 13b as model, so the settings on .env file are set like this


 OPENAI_API_BASE_URL=http://localhost:443/v1
 EMBED_DIM=5120
 OPENAI_API_KEY=../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin
  In your case, should be like this:     ``` OPENAI_API_BASE_URL=http://localhost:443/v1 EMBED_DIM=4096 OPENAI_API_KEY=/home/nero/code/llama.cpp/models/7B/ggml-model-q4_0.bin  
Have you already did that, or you just copied the values in the example?

Also, i'm not 100% sure the llama 7b model is good enough to handle AutoGPT requests correctly. I suggest you to try with vicuna. Even better the 13b model, wich is the one i and kendenl used for testing

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***&gt;

@Neronjust2017 you're exactly right. the issue is the power of the model. i also see this issue sometimes when the response ends early and the crashes autogpt. i have had better luck with vicuna (vs. alpaca and llama), and 13B (7b is very inconsistent)

There may be a better way of prompting that could help with this – i was going to try babyagi and see how the results differ

vicuna 13b, the best model i've tested so far, isn't doing a GREAT job generating actions that it can follow continuously (gets stuck in a loop)

What's about running auto-gpt against open-ai with a large amount of goals, persisting question/answer-pairs and use that as a (lora-)finetuning-dataset for vicuna?

I almost always get timeouts on the autogpt side. Choosing 7B makes it a tad less likely but even then it gets 1-2 responses deep at most. Using latest llama.cpp and node 20, by the way, I thought to use avx512 users should pass a param to llama.cpp, no?:

gpt-llama.cpp:

--LLAMA.CPP SPAWNED--
..\llama.cpp\build\bin\Release\main.exe -m ..\llama.cpp\models\vicuna\1.1TheBloke\ggml-vicuna-7b-1.1-q4_1.bin --temp 0 --n_predict 3008 --top_p 0.1 --top_k 40 -b 512 -c 2048 --repeat_penalty 1.1764705882352942 --reverse-prompt user: --reverse-prompt
user --reverse-prompt system: --reverse-prompt
system --reverse-prompt ## --reverse-prompt
## --reverse-prompt ### -i -p ### Instructions
Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

### Inputs
system: You are a helpful assistant.
user: How are you?
assistant: Hi, how may I help you today?
system: You are CatJokeFinder, a life-long cat joke lover and will occasionally be woken up from hibernation to fetch cat jokes as CatJokeFinder
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. Find cat jokes on the internet
2. Sort them from worst to best
3. Discuss what makes a good cat joke and use this discussion to improve the sorting
4. Write the cat jokes into csv files, categorised
5. Occasionally, CatJokeFinder likes dog jokes too, so the same but for dog jokes

Constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"
5. Use subprocesses for commands that will not terminate within a few minutes

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args:
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Analyze Code: "analyze_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Do Nothing: "do_nothing", args:
20. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.

You should only respond in JSON format as described below
Response Format:
{
    "thoughts": {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted\n- list that conveys\n- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    },
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    }
}
Ensure the response can be parsed by Python json.loads
system: The current time and date is Fri Apr 21 18:08:06 2023
system: This reminds you of these events from your past:

### Response
user: Determine which next command to use, and respond using the format specified above:
assistant:

--REQUEST--
user: Determine which next command to use, and respond using the format specified above:
--RESPONSE LOADING...--
--RESPONSE LOADING...--

--RESPONSE--
 To determine which next command to use, I will analyze my current state and past decisions. I will use the "analyze\_code" command to analyze my current code and identify areas that need improvement. Then, I will use the "improve\_code" command to suggest improvements to my code. Finally, I will use the "write\_tests" command to write tests for my improved code.
user:Request DONE

--LLAMA.CPP SPAWNED--
..\llama.cpp\build\bin\Release\main.exe -m ..\llama.cpp\models\vicuna\1.1TheBloke\ggml-vicuna-7b-1.1-q4_1.bin --temp 0 --n_predict  --top_p 0.1 --top_k 40 -b 512 -c 2048 --repeat_penalty 1.1764705882352942 --reverse-prompt user: --reverse-prompt
user --reverse-prompt system: --reverse-prompt
system --reverse-prompt ## --reverse-prompt
## --reverse-prompt ### -i -p ### Instructions
Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

### Inputs
system: You are a helpful assistant.
user: How are you?
assistant: Hi, how may I help you today?
system: You are now the following python function: ```# This function takes a JSON string and ensures that it is parseable and fully compliant with the provided schema. If an object or field specified in the schema isn't contained within the correct JSON, it is omitted. The function also escapes any double quotes within JSON string values to ensure that they are valid. If the JSON string contains any None or NaN values, they are replaced with null before being parsed.
def fix_json(json_string: str, schema:str=None) -> str:```

Only respond with your `return` value.

### Response
user: ''' To determine which next command to use, I will analyze my current state and past decisions. I will use the "analyze\_code" command to analyze my current code and identify areas that need improvement. Then, I will use the "improve\_code" command to suggest improvements to my code. Finally, I will use the "write\_tests" command to write tests for my improved code.
''', '''
{
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    },
    "thoughts":
    {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted
- list that conveys
- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    }
}
'''
assistant:

--REQUEST--
user: ''' To determine which next command to use, I will analyze my current state and past decisions. I will use the "analyze\_code" command to analyze my current code and identify areas that need improvement. Then, I will use the "improve\_code" command to suggest improvements to my code. Finally, I will use the "write\_tests" command to write tests for my improved code.
''', '''
{
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    },
    "thoughts":
    {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted
- list that conveys
- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    }
}
'''
Readable Stream: CLOSED

Using memory of type:  LocalCache
Using Browser:  chrome
Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\http\client.py", line 1374, in getresponse
    response.begin()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\http\client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\http\client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\socket.py", line 705, in readinto
    return self._sock.recv_into(b)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\requests\adapters.py", line 489, in send
    resp = conn.urlopen(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen
    retries = retries.increment(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\util\retry.py", line 550, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\packages\six.py", line 770, in reraise
    raise value
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\connectionpool.py", line 451, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\urllib3\connectionpool.py", line 340, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='localhost', port=443): Read timed out. (read timeout=600)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\openai\api_requestor.py", line 516, in request_raw
    result = _thread_context.session.request(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\requests\sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\requests\sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\requests\adapters.py", line 578, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPConnectionPool(host='localhost', port=443): Read timed out. (read timeout=600)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\__main__.py", line 5, in <module>
    autogpt.cli.main()
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\click\core.py", line 1635, in invoke
    rv = super().invoke(ctx)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\cli.py", line 151, in main
    agent.start_interaction_loop()
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\agent\agent.py", line 83, in start_interaction_loop
    assistant_reply_json = fix_json_using_multiple_techniques(assistant_reply)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\json_utils\json_fix_llm.py", line 96, in fix_json_using_multiple_techniques
    assistant_reply_json = fix_and_parse_json(assistant_reply)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\json_utils\json_fix_llm.py", line 150, in fix_and_parse_json
    return try_ai_fix(try_to_fix_with_gpt, e, json_to_load)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\json_utils\json_fix_llm.py", line 179, in try_ai_fix
    ai_fixed_json = auto_fix_json(json_to_load, JSON_SCHEMA)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\json_utils\json_fix_llm.py", line 65, in auto_fix_json
    result_string = call_ai_function(
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\llm_utils.py", line 50, in call_ai_function
    return create_chat_completion(model=model, messages=messages, temperature=0)
  File "C:\Projects\generative\llm\Auto-GPT\autogpt\llm_utils.py", line 93, in create_chat_completion
    response = openai.ChatCompletion.create(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\openai\api_resources\chat_completion.py", line 25, in create
    return super().create(*args, **kwargs)
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\openai\api_requestor.py", line 216, in request
    result = self.request_raw(
  File "C:\Users\Daniel\anaconda3\envs\textgen\lib\site-packages\openai\api_requestor.py", line 526, in request_raw
    raise error.Timeout("Request timed out: {}".format(e)) from e
openai.error.Timeout: Request timed out: HTTPConnectionPool(host='localhost', port=443): Read timed out. (read timeout=600)

@Neronjust2017 you're exactly right. the issue is the power of the model. i also see this issue sometimes when the response ends early and the crashes autogpt. i have had better luck with vicuna (vs. alpaca and llama), and 13B (7b is very inconsistent)

@keldenl Hi, I tried using the Vicuna-7b, but I still couldn't get a response but got Readable Stream: CLOSED when using AUTO-GPT. When I send the same message with the curl command, I can get a response:

import requests

url = 'http://localhost:443/v1/chat/completions'

payload = {
    "model": "gpt-3.5-turbo",
   "messages": [{'role': 'system', 'content': 'You are nero, a helpful assistant developed by OpenAI\nYour decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.\n\nGOALS:\n\n1. find out who is jobs\n2. tell me about his life\n\n\nConstraints:\n1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.\n2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.\n3. No user assistance\n4. Exclusively use the commands listed in double quotes e.g. "command name"\n5. Use subprocesses for commands that will not terminate within a few minutes\n\nCommands:\n1. Google Search: "google", args: "input": "<search>"\n2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"\n3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"\n4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"\n5. List GPT Agents: "list_agents", args: \n6. Delete GPT Agent: "delete_agent", args: "key": "<key>"\n7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"\n8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"\n9. Read file: "read_file", args: "file": "<file>"\n10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"\n11. Delete file: "delete_file", args: "file": "<file>"\n12. Search Files: "search_files", args: "directory": "<directory>"\n13. Analyze Code: "analyze_code", args: "code": "<full_code_string>"\n14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"\n15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"\n16. Execute Python File: "execute_python_file", args: "file": "<file>"\n17. Generate Image: "generate_image", args: "prompt": "<prompt>"\n18. Send Tweet: "send_tweet", args: "text": "<text>"\n19. Do Nothing: "do_nothing", args: \n20. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"\n\nResources:\n1. Internet access for searches and information gathering.\n2. Long Term memory management.\n3. GPT-3.5 powered Agents for delegation of simple tasks.\n4. File output.\n\nPerformance Evaluation:\n1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.\n2. Constructively self-criticize your big-picture behavior constantly.\n3. Reflect on past decisions and strategies to refine your approach.\n4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.\n\nYou should only respond in JSON format as described below \nResponse Format: \n{\n    "thoughts": {\n        "text": "thought",\n        "reasoning": "reasoning",\n        "plan": "- short bulleted\\n- list that conveys\\n- long-term plan",\n        "criticism": "constructive self-criticism",\n        "speak": "thoughts summary to say to user"\n    },\n    "command": {\n        "name": "command name",\n        "args": {\n            "arg name": "value"\n        }\n    }\n} \nEnsure the response can be parsed by Python json.loads'}, {'role': 'system', 'content': 'The current time and date is Sat Apr 22 00:38:34 2023'}, {'role': 'system', 'content': 'This reminds you of these events from your past:\n\n\n'}, {'role': 'user', 'content': 'Determine which next command to use, and respond using the format specified above:'}]
}

headers = {
    'Authorization': 'Bearer /home/nero/code/llama.cpp/models/vicuna-7b/ggml-model-q4_0.bin',
    'Content-Type': 'application/json'
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Please note that this time I am using the message failed for AUTO-GPT , so that I can rule out errors caused by the message. In addition, I also ensured that the parameters of the llama script called through both auto-gpt and curl are consistent:

--LLAMA.CPP SPAWNED--
/home/nero/code/llama.cpp/main -m /home/nero/code/llama.cpp/models/vicnua-7b/ggml-model-q4_0.bin --temp 0.7 --n_predict 512 --top_p 0.1 --top_k 40 -b 512 -c 2048 --repeat_penalty 1.1764705882352942 --reverse-prompt user: --reverse-prompt 
user --reverse-prompt system: --reverse-prompt 
system --reverse-prompt ## --reverse-prompt

Now, the script parameters and coversation message are the same, what other factors can cause Readable Stream: CLOSED ?

vicuna 13b, the best model i've tested so far, isn't doing a GREAT job generating actions that it can follow continuously (gets stuck in a loop)

What's about running auto-gpt against open-ai with a large amount of goals, persisting question/answer-pairs and use that as a (lora-)finetuning-dataset for vicuna?

@bjoern79de i was literally thinking about that last night, but it feels like that should be a last resort kind of thing and we should try our best with prompt engineering first IMO

also i don't know much about actual fine tuning loras, so there's that too haha

@dany-on-demand i'm getting the same experience – not sure why. it's also weird that it prints out part of the original prompt mid text generation.

since gpt-llama.cpp works by just spawning a terminal and running llama.cpp in it, i wonder what would cause it to "crash" and show the original prompt again

@Neronjust2017 the only other difference i can think of are the parameters, with temp being 0. let me see what autogpt does for its parameters..

Does this perform better than GPT-3.5 for AutoGPT?

Does this perform better than GPT-3.5 for AutoGPT?

It depends on the model you are using. Vicuna 13b should have a quality close to gpt3.5 or even slightly better in some cases, but in my experience gpt3.5 is still better overall.

I'm sure in the near future we'll see many models able to outperform it.

Also... There's still some work to do to.make llama and derivates work gor AutoGPT, right now many models aren't good enough to handle the json structure required by AutoGPT correctly.

I'll ask here, when I run auto-gpt configured according to the guide, my connection fails with the error HTTPConnectionPool(host='localhost', port=4000): Read timed out. (read timeout=600) I run it on a calculator and it looks like it cuts off the connection because it takes a long time for a response from gpt-llama In the gpt-llama window, text generation starts after the connection fails and, as a result, it remains there. Where and how to fix the wait timeout?

Hi all.. i just found a bug that was causing the readstream closed to happen a lot earlier than we expected, pushed the fix here: https://github.com/keldenl/gpt-llama.cpp/commit/aa914f2d43f2294149b9c4027c74a45b386348e8 – please have a test and lmk how it goes

Issue was Auto-GPT sometimes sends a NULL max_tokens – and in turn I would still pass it into llama.cpp so it'd break

// ...
  ],
  temperature: 0,
  max_tokens: null
}

the request sent to llama.cpp via gpt-llama.cpp

--LLAMA.CPP SPAWNED--
../llama.cpp/main -m ../llama.cpp/models/vicuna/13B/ggml-vicuna-unfiltered-13b-4bit.bin --temp 0 --n_predict  --top_p 0.1 --top_k 40 -b 512 -c 2048 --repeat_penalty 1.1764705882352942 --reverse-prompt user: --reverse-prompt

u see the flag but no value for --n_predict , which causes it to crash and thus the readstream closed immediately. the fix just ignores max_tokens if it's null (as it should), so it no longer crashes

i'm running with this fix right now and autogpt no longer stops early! i still see some potential weirdness with embeddings happening at the same time potentially (?), but i wanted to push out this fix asap so folks could start testing it now

Does this perform better than GPT-3.5 for AutoGPT?

It depends on the model you are using. Vicuna 13b should have a quality close to gpt3.5 or even slightly better in some cases, but in my experience gpt3.5 is still better overall.

I'm sure in the near future we'll see many models able to outperform it.

Also... There's still some work to do to.make llama and derivates work gor AutoGPT, right now many models aren't good enough to handle the json structure required by AutoGPT correctly.

+1 on this, but i would say vicuna is still lacking vs. gpt3.5 imo. vicuna is the only one so far (>=13B) that can somewhat reliably hit that json formatting

I'll ask here, when I run auto-gpt configured according to the guide, my connection fails with the error HTTPConnectionPool(host='localhost', port=4000): Read timed out. (read timeout=600) I run it on a calculator and it looks like it cuts off the connection because it takes a long time for a response from gpt-llama In the gpt-llama window, text generation starts after the connection fails and, as a result, it remains there. Where and how to fix the wait timeout?

not sure what u mean by running on a calculator, but have you tried the curl command or the test-installation.sh scripts and confirmed that you have gpt-llama.cpp working properly?

@keldenl The calculator is an old laptop which is not a pity. I thinking and just climbed into some kind of library that gave an error and corrected the timeout value there. I also recommend including llama.cpp blas in the assembly, as far as I understand it speeds up reading a large prompt.

Now it's crashing after trying a google request, I'll try your update. Well, it won’t help, I’ll look for how to turn on the search, I haven’t looked there yet

Hey! AutoGPT maintainer here. We’d love to support local models. Is someone in our discord that we can discuss getting this into a plug-in with?

Hey! AutoGPT maintainer here. We’d love to support local models. Is someone in our discord that we can discuss getting this into a plug-in with?

I'm in the discord! as @keldenl chatting in #local-models -- how are you imagining getting this into a plugin? i envisioning completely replacing openai usage via a flag, but it sounds like you're thinking about using it as a command correct?

either way works, any little step towards this helps! but was just curious

This is a significant amount of what I want to discuss. I want to make plug-ins capable of that if possible but it will require a good amount of architecture changing

This is a significant amount of what I want to discuss. I want to make plug-ins capable of that if possible but it will require a good amount of architecture changing

feel free to DM me on discord, you'll see my username on #local-models in the auto-gpt discord server

Just pushed 2 more changes (https://github.com/keldenl/gpt-llama.cpp/commit/12a9567a4582555d1c8d6b203c2e07c97922ccee) that should DRAMATICALLY improve Auto-GPT support. Please please please pull if you want the latest and greatest. (bumped the npm package to version 0.2.0 as well).

I wanted to quickly go over the previous issues that made gpt-llama.cpp unreliable with Auto-GPT, the new changes that resolve these issues, and next steps.

Previous Issues

llama.cpp crashed when max_tokens = null. And since Auto-GPT sends this sometimes, it will straight up crash llama.cpp and show the dreaded Readstream CLOSED in gpt-llama.cpp
gpt-llama.cpp embeddings endpoint didn't play well with interactive mode in ChatCompletion, which caused llama.cpp instances to crash randomly
Concurrent OpenAI requests emitting from Auto-GPT either crash or significantly slow down llama.cpp processes, which caused it to either crash or never complete
llama.cpp just.... stops working midway with long prompts sometimes. I experienced this when playing with llama.cpp and sometimes pressing enter causes it to resume responding again. i swear there was a thread in llama.cpp repo about this but i can't find it

Fixes for the issues

https://github.com/keldenl/gpt-llama.cpp/commit/aa914f2d43f2294149b9c4027c74a45b386348e8 - gpt-llama.cpp now ignores max_tokens if it's null, as it should. No longer crashes.
https://github.com/keldenl/gpt-llama.cpp/commit/1535eb174fcddc69bf78760cb2fdf633f56890bc - Add logic to kill existing spawned instances of llama.cpp properly when going from chat -> embeddings
https://github.com/keldenl/gpt-llama.cpp/commit/12a9567a4582555d1c8d6b203c2e07c97922ccee - Limit concurrency to 1. gpt-llama.cpp now ONLY processes one request at a time (to optimize for performance), but will queue up the other requests and will handle them in the order they came in
https://github.com/keldenl/gpt-llama.cpp/commit/12a9567a4582555d1c8d6b203c2e07c97922ccee - If llama.cpp is stuck for 20 seconds, it'll give it a nudge (new line). This has traditionally been my "solution" when using llama.cpp, and I'm just integrating it automagically on gpt-llama.cpp. We could make this smarter in the future by detecting how long previous token generations took and adjust this timeout as needed, but for v1 it'll be hardcoded to 20 seconds

Outcome? I'm happy to say that gpt-llama.cpp ACTUALLY runs continuously (previously randomly bugs out pretty consistent), just not SUPER high quality responses. Here's a screenshot of vicuna trying real hard (and gpt-llama.cpp w/ vicuna 13b continuing the loop!)

Next Steps So where do we go from here? Joining the Auto-GPT discord channel and browsing the local-models channel, I found that there was a fork that significantly reduced the prompt. Now that Auto-GPT runs on a loop indefinitely (for me), it's time to focus on the quality of these responses. Focus is to..

Continue pushing for https://github.com/Significant-Gravitas/Auto-GPT/pull/2594 to get merged as it's the PR that adds BASE_URL flexibility for gpt-llama.cpp
Explore Bill's reduced prompt in his fork (https://github.com/Significant-Gravitas/Auto-GPT/compare/master...BillSchumacher:Auto-GPT:vicuna) and examine the response quality. should improve due to requiring smaller context
Investigate escaped underscores in the responses that is causing auto-gpt to not recognize responses
Mess with temperature and see if different settings may improve the responses

Appreciate everybody's hard work, and please pull the newest changes so we can keep hacking and trying to get this working! Thanks everybody~~~ lets goooooo we got this 🚀🚀🚀

Pulled the latest..

Got through the first prompt ok it received the command properly NEXT ACTION: COMMAND = google ARGUMENTS = {'input': 'latest advances in AI technology for chatbots'} it passed a json object back to the chat with the search results gpt-llama.cpp returned the following ... } ] Human Feedback: GENERATE NEXT COMMAND JSON { object: 'list', data: [ { object: 'embedding', embedding: [], index: 0 } ], embeddingSize: 0 } Embedding Request DONE

PROCESS COMPLETE

Auto-GPT returned ... self.data.embeddings = np.concatenate( ^^^^^^^^^^^^^^^ File "<__array_function__ internals>", line 200, in concatenate ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 5120 and the array at index 1 has size 0

BTW, the link for the discord in the README does not seem to work. Is there an updated link?

hmm i see that the embedding failed

data: [ { object: 'embedding', embedding: [], index: 0 } ],

it happens sometimes but i'm not sure why or what may trigger it

and the discord server link was outdated because i converted it to a community server -_____- smh stupid me, i've updated the readme with the new link (i was wondering why ppl stopped joining)

new link for discord : https://discord.gg/yseR47MqpN

Wondering what if one would write a LLM to detect if the chain is becoming loopy and then increase temperature / number of samples or fall back to GPT4 or something

If you did that the memory could be messed up/in a weird place

What is the advantage of using Llama vs something like the Transformers python library?

In terms of LLM use for the auto-gpt prompting has anyone tried gpt4 x alpaca 13B model. I've found in general use this responds better than Vicuna 13B. Just wonder if it may help 🤔

gpt4 x alpaca 13b has been pretty slow for me, but maybe it might be my parameters?

gpt4 x alpaca 13b has been pretty slow for me, but maybe it might be my parameters?

It's a lot to run. A modern CPU with 12+ cores should run it reasonably well, but you really want it running on reasonably modern GPU for about 10x performance. If you have swapping to disk, or copying sharing of layers and copying of layer outputs/inputs between cards/ram/vram, that will hurt performance a lot. If none of that helps you, you probably want to fall back on 7B for now.

In terms of LLM use for the auto-gpt prompting has anyone tried gpt4 x alpaca 13B model. I've found in general use this responds better than Vicuna 13B. Just wonder if it may help thinking

Vicuna is very good. Could be wrong, but the problem here is probably a difference in prompting style required. Note the gpt in both chatgpt and gpt4 x alpaca. They likely just share a better understanding of the particular prompts given by auto-gpt right now.

This works in llama.cpp, with eachadea_legacy-ggml-vicuna-13b-4bit, for example:

> generate json in the format { "commmand": "(command name)", "arguments": [ "arg1", ... ] } where command is one of "eat_cereal", "pour_cereal", "add_milk", etc.

{
"command": "eat\_cereal",
"arguments": [
"Kellogg's Frosted Flakes"
]
}

In terms of LLM use for the auto-gpt prompting has anyone tried gpt4 x alpaca 13B model. I've found in general use this responds better than Vicuna 13B. Just wonder if it may help thinking

Vicuna is very good. Could be wrong, but the problem here is probably a difference in prompting style required. Note the gpt in both chatgpt and gpt4 x alpaca. They likely just share a better understanding of the particular prompts given by auto-gpt right now.

This works in llama.cpp, with eachadea_legacy-ggml-vicuna-13b-4bit, for example:
> generate json in the format { "commmand": "(command name)", "arguments": [ "arg1", ... ] } where command is one of "eat_cereal", "pour_cereal", "add_milk", etc.

{
"command": "eat\_cereal",
"arguments": [
"Kellogg's Frosted Flakes"
]
}

I think the prompts in the default Auto-GPT setup are way too verbose for use with local models (and this contributes to their confusion). I have had most luck with the unfiltered vicuna 13b model (ggml), but am hoping to get the 7b wizardlm model working by "engineering" the prompts a bit.

The longer I work on minor tweaks on this project the more I think I should fork it and just rip out the openai section entirely, but don't want to just rewrite something else that already exists.

So far, I have tried with varying success making use of gpt-llama-cpp with Auto-GPT and BabyAGI. I've given AgentLLM a try, but so far getting that set up has turned out to be a nightmare.

My most recent "working" Auto-GPT is now connected to wizardlm via llama.cpp via their own server implementation (important note) if you want it to work with Auto-GPT you will have to modify their ChatCompletionRequest validator to not include max_tokens field as that keeps making it break, but it seems to be more easy to understand than trying to make it run under some javascript language.

I am using DGDev's fork (so my repo is quite far behind master afaict), but otherwise, when I have more time I intend to get wizardlm working as my thinking/processing agent.

Doubtful this will help anyone, but good luck to any who try :)

this is my unsuccessful experience so far, using an apple M1 and model ggml-vicuna-13b-1.1-q4_2.bin

AutoGPT on commit b349f214 as per llama-cpp autogpt docs: i had to make a fix in AutoGPT/autogpt/json_utils/json_fix_llm.py/fix_and_parse_json adding a "{" in front of json_to_load, to make it accept the json. i haven't investigated further....

following is the AutoGPT .env, the default template + adjustments as per llama-cpp autogpt docs

################################################################################
### AUTO-GPT - GENERAL SETTINGS
################################################################################

## EXECUTE_LOCAL_COMMANDS - Allow local command execution (Default: False)
## RESTRICT_TO_WORKSPACE - Restrict file operations to workspace ./auto_gpt_workspace (Default: True)
# EXECUTE_LOCAL_COMMANDS=False
# RESTRICT_TO_WORKSPACE=True

## USER_AGENT - Define the user-agent used by the requests library to browse website (string)
# USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"

## AI_SETTINGS_FILE - Specifies which AI Settings file to use (defaults to ai_settings.yaml)
# AI_SETTINGS_FILE=ai_settings.yaml

## OPENAI_API_BASE_URL - Custom url for the OpenAI API, useful for connecting to custom backends. No effect if USE_AZURE is true, leave blank to keep the default url
OPENAI_API_BASE_URL=http://localhost:443/v1

## EMBED_DIM - Define the embedding vector size, useful for models. OpenAI: 1536 (default), LLaMA 7B: 4096, LLaMA 13B: 5120, LLaMA 33B: 6656, LLaMA 65B: 8192 (Default: 1536)
# EMBED_DIM=5120

################################################################################
### LLM PROVIDER
################################################################################

### OPENAI
## OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)
## TEMPERATURE - Sets temperature in OpenAI (Default: 0)
## USE_AZURE - Use Azure OpenAI or not (Default: False)
OPENAI_API_KEY=/Users/valerio.lupi/repos/ai/llama.cpp/models/ggml-vicuna-13b-1.1-q4_2.bin
# TEMPERATURE=0
# USE_AZURE=False

### AZURE
# moved to `azure.yaml.template`

################################################################################
### LLM MODELS
################################################################################

## SMART_LLM_MODEL - Smart language model (Default: gpt-4)
## FAST_LLM_MODEL - Fast language model (Default: gpt-3.5-turbo)
# SMART_LLM_MODEL=gpt-4
# FAST_LLM_MODEL=gpt-3.5-turbo

### LLM MODEL SETTINGS
## FAST_TOKEN_LIMIT - Fast token limit for OpenAI (Default: 4000)
## SMART_TOKEN_LIMIT - Smart token limit for OpenAI (Default: 8000)
## When using --gpt3only this needs to be set to 4000.
# FAST_TOKEN_LIMIT=4000
# SMART_TOKEN_LIMIT=8000

################################################################################
### MEMORY
################################################################################

### MEMORY_BACKEND - Memory backend type
## local - Default
## pinecone - Pinecone (if configured)
## redis - Redis (if configured)
## milvus - Milvus (if configured)
## MEMORY_INDEX - Name of index created in Memory backend (Default: auto-gpt)
# MEMORY_BACKEND=local
# MEMORY_INDEX=auto-gpt

### PINECONE
## PINECONE_API_KEY - Pinecone API Key (Example: my-pinecone-api-key)
## PINECONE_ENV - Pinecone environment (region) (Example: us-west-2)
# PINECONE_API_KEY=your-pinecone-api-key
# PINECONE_ENV=your-pinecone-region

### REDIS
## REDIS_HOST - Redis host (Default: localhost, use "redis" for docker-compose)
## REDIS_PORT - Redis port (Default: 6379)
## REDIS_PASSWORD - Redis password (Default: "")
## WIPE_REDIS_ON_START - Wipes data / index on start (Default: True)
# REDIS_HOST=localhost
# REDIS_PORT=6379
# REDIS_PASSWORD=
# WIPE_REDIS_ON_START=True

### WEAVIATE
## MEMORY_BACKEND - Use 'weaviate' to use Weaviate vector storage
## WEAVIATE_HOST - Weaviate host IP
## WEAVIATE_PORT - Weaviate host port
## WEAVIATE_PROTOCOL - Weaviate host protocol (e.g. 'http')
## USE_WEAVIATE_EMBEDDED - Whether to use Embedded Weaviate
## WEAVIATE_EMBEDDED_PATH - File system path were to persist data when running Embedded Weaviate
## WEAVIATE_USERNAME - Weaviate username
## WEAVIATE_PASSWORD - Weaviate password
## WEAVIATE_API_KEY - Weaviate API key if using API-key-based authentication
# WEAVIATE_HOST="127.0.0.1"
# WEAVIATE_PORT=8080
# WEAVIATE_PROTOCOL="http"
# USE_WEAVIATE_EMBEDDED=False
# WEAVIATE_EMBEDDED_PATH="/home/me/.local/share/weaviate"
# WEAVIATE_USERNAME=
# WEAVIATE_PASSWORD=
# WEAVIATE_API_KEY=

### MILVUS
## MILVUS_ADDR - Milvus remote address (e.g. localhost:19530)
## MILVUS_COLLECTION - Milvus collection,
## change it if you want to start a new memory and retain the old memory.
# MILVUS_ADDR=your-milvus-cluster-host-port
# MILVUS_COLLECTION=autogpt

################################################################################
### IMAGE GENERATION PROVIDER
################################################################################

### OPEN AI
## IMAGE_PROVIDER - Image provider (Example: dalle)
## IMAGE_SIZE - Image size (Example: 256)
##   DALLE: 256, 512, 1024
# IMAGE_PROVIDER=dalle
# IMAGE_SIZE=256

### HUGGINGFACE
## HUGGINGFACE_IMAGE_MODEL - Text-to-image model from Huggingface (Default: CompVis/stable-diffusion-v1-4)
## HUGGINGFACE_API_TOKEN - HuggingFace API token (Example: my-huggingface-api-token)
# HUGGINGFACE_IMAGE_MODEL=CompVis/stable-diffusion-v1-4
# HUGGINGFACE_API_TOKEN=your-huggingface-api-token

### STABLE DIFFUSION WEBUI
## SD_WEBUI_AUTH - Stable diffusion webui username:password pair (Example: username:password)
## SD_WEBUI_URL - Stable diffusion webui API URL (Example: http://127.0.0.1:7860)
# SD_WEBUI_AUTH=
# SD_WEBUI_URL=http://127.0.0.1:7860

################################################################################
### AUDIO TO TEXT PROVIDER
################################################################################

### HUGGINGFACE
# HUGGINGFACE_AUDIO_TO_TEXT_MODEL=facebook/wav2vec2-base-960h

################################################################################
### GIT Provider for repository actions
################################################################################

### GITHUB
## GITHUB_API_KEY - Github API key / PAT (Example: github_pat_123)
## GITHUB_USERNAME - Github username
# GITHUB_API_KEY=github_pat_123
# GITHUB_USERNAME=your-github-username

################################################################################
### WEB BROWSING
################################################################################

### BROWSER
## HEADLESS_BROWSER - Whether to run the browser in headless mode (default: True)
## USE_WEB_BROWSER - Sets the web-browser driver to use with selenium (default: chrome).
##   Note: set this to either 'chrome', 'firefox', or 'safari' depending on your current browser
# HEADLESS_BROWSER=True
# USE_WEB_BROWSER=chrome
## BROWSE_CHUNK_MAX_LENGTH - When browsing website, define the length of chunks to summarize (in number of tokens, excluding the response. 75 % of FAST_TOKEN_LIMIT is usually wise )
# BROWSE_CHUNK_MAX_LENGTH=3000
## BROWSE_SPACY_LANGUAGE_MODEL is used to split sentences. Install additional languages via pip, and set the model name here. Example Chinese:  python -m spacy download zh_core_web_sm
# BROWSE_SPACY_LANGUAGE_MODEL=en_core_web_sm

### GOOGLE
## GOOGLE_API_KEY - Google API key (Example: my-google-api-key)
## CUSTOM_SEARCH_ENGINE_ID - Custom search engine ID (Example: my-custom-search-engine-id)
# GOOGLE_API_KEY=your-google-api-key
# CUSTOM_SEARCH_ENGINE_ID=your-custom-search-engine-id

################################################################################
### TTS PROVIDER
################################################################################

### MAC OS
## USE_MAC_OS_TTS - Use Mac OS TTS or not (Default: False)
# USE_MAC_OS_TTS=False

### STREAMELEMENTS
## USE_BRIAN_TTS - Use Brian TTS or not (Default: False)
# USE_BRIAN_TTS=False

### ELEVENLABS
## ELEVENLABS_API_KEY - Eleven Labs API key (Example: my-elevenlabs-api-key)
## ELEVENLABS_VOICE_1_ID - Eleven Labs voice 1 ID (Example: my-voice-id-1)
## ELEVENLABS_VOICE_2_ID - Eleven Labs voice 2 ID (Example: my-voice-id-2)
# ELEVENLABS_API_KEY=your-elevenlabs-api-key
# ELEVENLABS_VOICE_1_ID=your-voice-id-1
# ELEVENLABS_VOICE_2_ID=your-voice-id-2

################################################################################
### TWITTER API
################################################################################

# TW_CONSUMER_KEY=
# TW_CONSUMER_SECRET=
# TW_ACCESS_TOKEN=
# TW_ACCESS_TOKEN_SECRET=

starting gpt-llama with

EMBEDDINGS=py npm start mlock

gpt-llama output, started with EMBEDDINGS (without doesnt made a difference, though...)

░▒▓    ~/r/a/gpt-llama.cpp  on   master !2 ▓▒░ EMBEDDINGS=py npm start mlock

> gpt-llama.cpp@0.2.3 start
> node index.js mlock

Server is listening on:
  - http://localhost:443
  - http://192.168.1.182:443 (for other devices on the same network)

See Docs
  - http://localhost:443/docs

Test your installation
  - open another terminal window and run sh ./test-installation.sh

See https://github.com/keldenl/gpt-llama.cpp#usage for more guidance.
> REQUEST RECEIVED
> PROCESSING NEXT REQUEST FOR /v1/chat/completions

=====  CHAT COMPLETION REQUEST  =====

=====  LLAMA.CPP SPAWNED  =====
/Users/valerio.lupi/repos/ai/llama.cpp/main -m /Users/valerio.lupi/repos/ai/llama.cpp/models/ggml-vicuna-13b-1.1-q4_2.bin --temp 0.7 --n_predict 3089 --top_p 0.1 --top_k 40 -c 2048 --seed -1 --repeat_penalty 1.1764705882352942 --mlock --reverse-prompt user: --reverse-prompt 
user --reverse-prompt system: --reverse-prompt 
system --reverse-prompt ## --reverse-prompt 
## --reverse-prompt ### -i -p ### Instructions
Complete the following chat conversation between the user and the assistant. System messages should be strictly followed as additional instructions.

### Inputs
system: You are a helpful assistant.
user: How are you?
assistant: Hi, how may I help you today?
system: You are Entrepreneur-GPT, an AI designed to autonomously develop and run businesses with the
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. Increase net worth
2. Develop and manage multiple businesses autonomously

Constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"
5. Use subprocesses for commands that will not terminate within a few minutes

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args: 
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Analyze Code: "analyze_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Send Tweet: "send_tweet", args: "text": "<text>"
19. Do Nothing: "do_nothing", args: 
20. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to ensure you are performing to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Every command has a cost, so be smart and efficient. Aim to complete tasks in the least number of steps.

You should only respond in JSON format as described below 
Response Format: 
{
    "thoughts": {
        "text": "thought",
        "reasoning": "reasoning",
        "plan": "- short bulleted\n- list that conveys\n- long-term plan",
        "criticism": "constructive self-criticism",
        "speak": "thoughts summary to say to user"
    },
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    }
} 
Ensure the response can be parsed by Python json.loads
system: The current time and date is Sat May  6 11:38:35 2023
system: This reminds you of these events from your past:

### Response
user: Determine which next command to use, and respond using the format specified above:
assistant:

=====  REQUEST  =====
user: Determine which next command to use, and respond using the format specified above:
=====  PROCESSING PROMPT...  =====
=====  PROCESSING PROMPT...  =====
=====  PROCESSING PROMPT...  =====

=====  RESPONSE  =====

"thoughts": {
"text": "I am not sure what my next action should be. I will need more information about the task at hand in order to make a decision.",
"reasoning": "I do not have enough context to determine which command would be most appropriate.",
"plan": "- Research available resources and constraints.\n- Consider past successes and failures to inform my approach.\n- Seek additional guidance or input as needed.",
"criticism": "I need more information before I can make a decision. I should gather more context before proceeding.",
"speak": "I am not sure what the best course of action is at this time."
},
"command": {
"name": "list_agents",
"args": {}
}
}
user:Request DONE
> PROCESS COMPLETE
> REQUEST RECEIVED
> PROCESSING NEXT REQUEST FOR /v1/embeddings

=====  EMBEDDING REQUEST  =====
py

=====  PYTHON EMBEDDING EXTENSION SPAWNED  =====

=====  STDERR  =====
stderr Readable Stream: CLOSED

llama_print_timings:        load time = 35890.02 ms
llama_print_timings:      sample time =   128.97 ms /   173 runs   (    0.75 ms per run)
llama_print_timings: prompt eval time = 60786.32 ms /  1165 tokens (   52.18 ms per token)
llama_print_timings:        eval time = 61676.39 ms /   173 runs   (  356.51 ms per run)
llama_print_timings:       total time = 182897.51 ms

{
  object: 'list',
  data: [ { object: 'embedding', embedding: [Array], index: 0 } ],
  embeddingSize: 768,
  usage: { prompt_tokens: 768, total_tokens: 768 }
}
Embedding Request DONE
> PROCESS COMPLETE

on the other side, AutoGPT errored with

░▒▓    ~/r/a/Auto-GPT  on  @b349f214 !2 ▓▒░ python3 -m autogpt
Warning: The file 'auto-gpt.json' does not exist. Local memory would not be saved to a file.
NEWS:  # Website and Documentation Site 📰📖 Check out *https://agpt.co*, the official news & updates site for Auto-GPT! The documentation also has a place here, at *https://docs.agpt.co* # 🚀 v0.3.0 Release 🚀 Over a week and 275 pull requests have passed since v0.2.2, and we are happy to announce the release of v0.3.0! *From now on, we will be focusing on major improvements* rather than bugfixes, as we feel stability has reached a reasonable level. Most remaining issues relate to limitations in prompt generation and the memory system, which will be the focus of our efforts for the next release. Highlights and notable changes in this release: ## Plugin support 🔌 Auto-GPT now has support for plugins! With plugins, you can extend Auto-GPT's abilities, adding support for third-party services and more. See https://github.com/Significant-Gravitas/Auto-GPT-Plugins for instructions and available plugins. ## Changes to Docker configuration 🐋 The workdir has been changed from */home/appuser* to */app*. Be sure to update any volume mounts accordingly! # ⚠️ Command `send_tweet` is DEPRECATED, and will be removed in v0.4.0 ⚠️ Twitter functionality (and more) is now covered by plugins, see [Plugin support 🔌]
Welcome back!  Would you like me to return to being Entrepreneur-GPT?
Continue with the last settings?
Name:  Entrepreneur-GPT
Role:  an AI designed to autonomously develop and run businesses with the
Goals: ['Increase net worth', 'Develop and manage multiple businesses autonomously']
Continue (y/n): y
Using memory of type:  LocalCache
Using Browser:  chrome
 THOUGHTS:  I am not sure what my next action should be. I will need more information about the task at hand in order to make a decision.
REASONING:  I do not have enough context to determine which command would be most appropriate.
PLAN: 
-  Research available resources and constraints.
-  Consider past successes and failures to inform my approach.
-  Seek additional guidance or input as needed.
CRITICISM:  I need more information before I can make a decision. I should gather more context before proceeding.
NEXT ACTION:  COMMAND = list_agents ARGUMENTS = {}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
Traceback (most recent call last):
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/valerio.lupi/repos/ai/Auto-GPT/autogpt/__main__.py", line 5, in <module>
    autogpt.cli.main()
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/site-packages/click/core.py", line 1635, in invoke
    rv = super().invoke(ctx)
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/valerio.lupi/.pyenv/versions/3.10.1/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/valerio.lupi/repos/ai/Auto-GPT/autogpt/cli.py", line 151, in main
    agent.start_interaction_loop()
  File "/Users/valerio.lupi/repos/ai/Auto-GPT/autogpt/agent/agent.py", line 184, in start_interaction_loop
    self.memory.add(memory_to_add)
  File "/Users/valerio.lupi/repos/ai/Auto-GPT/autogpt/memory/local.py", line 82, in add
    self.data.embeddings = np.concatenate(
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1536 and the array at index 1 has size 768

Vicuna is very good. Could be wrong, but the problem here is probably a difference in prompting style required. Note the gpt in both chatgpt and gpt4 x alpaca. They likely just share a better understanding of the particular prompts given by auto-gpt right now.

Yes, to make a local llm work with AutoGPT we'll most likely need a way to change the prompt passed to to it. I already worked on this, and made another PR for AutoGPT: https://github.com/Significant-Gravitas/Auto-GPT/pull/3375 We probably have to wait some more time before seeing them merged to AutoGPT's main repo, because there is an heavy re-architecturing going on right now.

this is my unsuccessful experience so far, using an apple M1 and model ggml-vicuna-13b-1.1-q4_2.bin

You kept EMBED_DIM commented. Remove the # before it and it should work.

My thought on and considerations on using Local LLM.

Prompts must be less complex and ask less. Local LLMs are typically trained with a 2024 Token Context Window, which means, the prompt and the response must fit in the context window. You can't rely that the llm will remember much more than the prompt you gave it.

Giving the constraints, the agent needs to remember things like the task list, the thought section, and maybe other things like completed tasks. Every prompt needs to be reconstructed and not rely on a previous prompt. I believe this will help prevent any looping issues.

The main thread needs to have a very limited set of commands, maybe just enough to manage sub-agents and some file management. The sub-agents commands offered could be a subset of Linux shell commands or python as it knowledgeable in these syntaxes. Then it would need to be parsed. I feel this is a better then trying to get it to conform to a strict JSON format.

I think some of the would help with standard Auto-GPT as well, have not had it successfully finish and objective yet without it failing in some nasty loop.

You kept EMBED_DIM commented. Remove the # before it and it should work.

wtf, you're right! unfortuntately, setting proper EMBED_DIM didn't make a difference :(

EDIT: if i do not use EMBEDDINGS=py, it works with the correct EMBED_DIM. either, EMBED_DIM must be set to 768. anyway, it stops later when using the chromedriver, since "url" is empty (should search with google "how to increase net worth".

keep investigating .....

EMBEDDINGS=py (non-llama) should work with EMBED_DIM of 768. otherwise, use the embedding size for the llama model you have.

@valerino i would try and see if embeddings are working in general first, since it does need to install the sentence transformers stuff before it works the first time

@DGdev91 i've been messing with the prompt too – how successful has your branch's prompt been?

@DGdev91 i've been messing with the prompt too – how successful has your branch's prompt been?

Actually i didn't had much time to test that in the past days. But in the few times i tried, it seemed like that for llama and derivates a good rule seems to be "few is better".

I have been toying around with generator.py to give it a little more push on what they should include with each property value. Giving it a nudge to the right direction has far better result and it is now filling up the json.

self.response_format = { "thoughts": { "text": "<replace this with a single string of your AI thoughts", "reasoning": "<replace this with a single string of your AI reasoning>", "plan": "<replace this with a single string of your AI short multi bullet points that convey long term plans>", "criticism": "<replace this with a single string of your AI constructive self-criticism>", "speak": "<replace this with a single string of your AI thoughts summary to say to user as response>",

I've also messed around with ai_config.py to move GOALS towards the later part of the prompt, this time round it can remember the goals (it never even touch any goals previously). However it isn't working well still as it will go around forgetting the other part of the long prompt. These were all experiments to understand why local LLM just can't cope and bringing back the proper JSON reconstructed responses.

So yes, the original prompt meant for Open AI GPT will need to be reconstructed for local LLM. It is way too long and LocalLLM is having amnesia just within the same prompt :D

Moreover, it isn't that 'smart' enough to populate the right area of JSON.

*btw, i am using vicuna13b non quantized , which seems to work much better for JSON responses than the quantized variant for some strange reason. Also using oobabooga extension for openAPI calls.

In this reyponse_format there is still lots of redundancy. can't we tell it something like: ˋˋˋ in the following definition, "do X" means: "replace this with a single string of your AI X", so "reasoning": "\<do reasoning>" means "reasoning": "\<replace this with a single string of your AI reasoning>" ... "text": "\<do thoughts>", "reasoning": "\<do reasoning>", "plan": "\<do short multi bullet points that convey long term plans>", "criticism": "\< "do constructive self-criticism>", "speak": "\<do thoughts summary to say to user as response>" ˋˋˋ

this would save a lot of tokens.

@maddes8cht i was just experimenting with the prompt to see if it made a difference, and it did! Indeed it would need a shorter one to save some tokens. However, having it too simple would not have vicuna realized on what we actually intended it to do. I've mentioned in discord of autogpt if we can put the whole command list request section into permanent vector embedding so we don't have to include it in each api completion call, saving lots of tokens.

@OreoTango I would have loved to know exactly what difference it made in the outcome for you - Anyway, I had time to play around with it myself today. I didn't get a meaningful response json sentence with any Vicuna model I have, though I don't have an unquantized model either.

What I had a very resounding success with was an openAssistant-30b-q5_1 model. I think it must be from TheBloke, thus from this Model Card: https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML TheBloke updated the models today to the new Llama.cpp format, I still used the old one, so there might be some differences (but of course it should be improvements).

The problem with this model is of course that it is bigger and slower - which is why I first get timeouts from Auto-Gpt while waiting.

What I did:

In llm_utils.py there is in line 73 and in line 143 a num-retries = 10 I have increased this value, and strongly - to 1000 I'm not sure how small the value would have to be to still work, but 100 wasn't enough for me.

In Promptgenerator.py the definition of response_format from line 23 on looks like this:

        self.response_format = {
            "thoughts": {
                "text": "<replace with single sentence of your AI thoughts>",
                "reasoning": "<replace with single sentence of your AI reasoning>",
                }, "plan": "<replace with short multi bullet points list that convey your AI long term plans>",
                "criticism": "<replace with single sentence of your AI constructive self-criticism>",
                "speak": "<replace with single string of your AI thoughts summary to say to user as response>",
            },

What i got:

I've run a lot of samples and ran into different kind of problems, so this is not mor or less better or worse, but different from several others. To mention: I doget a proper json-response, at least most of the time. Funny kind of problem being is that the "plan" part actually does give a (in my opinion correct and valid) proper json - list, on which autoGPT is complaining about because these kind of Lists doesn't seem to be expected. Someone may explain me how and why OpanAssistants output here is not valid, but as far as i can see it seems to be.

I also do get a proper Command, which is

 COMMAND = google ARGUMENTS = {'input': 'sell clothes online'}

but the commands seems to return nothing, which in this case isn't a problem of the local model but the forked Auto-Gpt version.

In the shown example, the loop ends there with an error - I've got longer sessions with more loops, but as none of the commands returns anything useful, it gets more and more selfcentric.

So, now follows my sample session:

sample session:

Auto-GPT output:

Debug Mode:  ENABLED
NEWS:  # Website and Documentation Site 📰📖 Check out *https://agpt.co*, the official news & updates site for Auto-GPT! The documentation also has a place here, at *https://docs.agpt.co* # 🚀 v0.3.0 Release 🚀 Over a week and 275 pull requests have passed since v0.2.2, and we are happy to announce the release of v0.3.0! *From now on, we will be focusing
 on major improvements* rather than bugfixes, as we feel stability has reached a reasonable level. Most remaining issues relate to limitations in prompt generation and the memory
 system, which will be the focus of our efforts for the next release. Highlights and notable changes in this release: ## Plugin support 🔌 Auto-GPT now has support for plugins! With plugins, you can extend Auto-GPT's abilities, adding support for third-party services
 and more. See https://github.com/Significant-Gravitas/Auto-GPT-Plugins for instructions and available plugins. ## Changes to Docker configuration 🐋 The workdir has been changed
 from */home/appuser* to */app*. Be sure to update any volume mounts accordingly! # ⚠️ Command `send_tweet` is DEPRECATED, and will be removed in v0.4.0 ⚠️ Twitter functionality (and more) is now covered by plugins, see [Plugin support 🔌]
Welcome back!  Would you like me to return to being Entrepreneur-GPT?
Continue with the last settings?
Name:  Entrepreneur-GPT
Role:  an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.
Goals: ['Increase net worth.', 'Develop and manage multiple businesses autonomously.', 'Play to your strengths as a Large Language Model.']
Continue (y/n): y
Using memory of type:  LocalCache
Using Browser:  chrome
  Token limit: 2000
  Memory Stats: (0, (0, 6656))
  Token limit: 2000
  Send Token Count: 963
  Tokens remaining for response: 1037
  ------------ CONTEXT SENT TO AI ---------------
  System: The current time and date is Sat May 13 11:26:15 2023

  System: This reminds you of these events from your past:

  User: Determine which next command to use, and respond using the format specified above:

  ----------- END OF CONTEXT ----------------
Creating chat completion with model gpt-3.5-turbo, temperature 0.0, max_tokens 1037
The JSON object is invalid.
{
    "thoughts": {
        "text": "I am considering my options for increasing net worth.",
        "reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",
        "plan": [
            "Analyze Code",
            "Write Tests",
            "Start GPT Agent"
        ],
        "criticism": "I need to be mindful of the 2000-word short term memory limit.",
        "speak": "Currently, I am contemplating which command I should execute next to achieve my goal."
    },
    "command": {
        "name": "list_agents",
        "args": {}
    }
}
The following issues were found:
Error: ['Analyze Code', 'Write Tests', 'Start GPT Agent'] is not of type 'string'
 THOUGHTS:  I am considering my options for increasing net worth.
REASONING:  As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.
PLAN:
-  Analyze Code
-  Write Tests
-  Start GPT Agent
CRITICISM:  I need to be mindful of the 2000-word short term memory limit.
NEXT ACTION:  COMMAND = list_agents ARGUMENTS = {}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
SYSTEM:  Command list_agents returned: List of agents:
  Token limit: 2000
  Memory Stats: (1, (1, 6656))
  Token limit: 2000
  Send Token Count: 1196
  Tokens remaining for response: 804
  ------------ CONTEXT SENT TO AI ---------------
  System: The current time and date is Sat May 13 11:33:04 2023

  System: This reminds you of these events from your past:
['Assistant Reply: {\r\n"thoughts": {\r\n"text": "I am considering my options for increasing net worth.",\r\n"reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",\r\n"plan": ["Analyze Code", "Write Tests", "Start GPT Agent"],\r\n"criticism": "I need to be mindful of the 2000-word short term memory limit.",\r\n"speak": "Currently, I am contemplating which command I should execute next to achieve my goal."\r\n},\r\n"command": {\r\n"name": "list_agents",\r\n"args": {}\r\n}\r\n} \nResult: Command list_agents returned: List of agents:\n \nHuman Feedback: GENERATE NEXT COMMAND JSON ']

  User: Determine which next command to use, and respond using the format specified above:

  ----------- END OF CONTEXT ----------------
Creating chat completion with model gpt-3.5-turbo, temperature 0.0, max_tokens 804
The JSON object is invalid.
{
    "thoughts": {
        "text": "I am considering my options for increasing net worth.",
        "reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",
        "plan": [
            "Analyze Code",
            "Write Tests",
            "Start GPT Agent"
        ],
        "criticism": "I need to be mindful of the 2000-word short term memory limit.",
        "speak": "Currently, I am contemplating which command I should execute next to achieve my goal."
    },
    "command": {
        "name": "google",
        "args": {
            "input": "sell clothes online"
        }
    }
}
The following issues were found:
Error: ['Analyze Code', 'Write Tests', 'Start GPT Agent'] is not of type 'string'
 THOUGHTS:  I am considering my options for increasing net worth.
REASONING:  As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.
PLAN:
-  Analyze Code
-  Write Tests
-  Start GPT Agent
CRITICISM:  I need to be mindful of the 2000-word short term memory limit.
NEXT ACTION:  COMMAND = google ARGUMENTS = {'input': 'sell clothes online'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=
SYSTEM:  Command google returned: []
  Token limit: 2000
Traceback (most recent call last):
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "E:\AutoGPT\Auto-GPT\autogpt\__main__.py", line 5, in <module>
    autogpt.cli.main()
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\site-packages\click\core.py", line 1635, in invoke
    rv = super().invoke(ctx)
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\Mathias\anaconda3\envs\AutoLlama\lib\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "E:\AutoGPT\Auto-GPT\autogpt\cli.py", line 151, in main
    agent.start_interaction_loop()
  File "E:\AutoGPT\Auto-GPT\autogpt\agent\agent.py", line 75, in start_interaction_loop
    assistant_reply = chat_with_ai(
  File "E:\AutoGPT\Auto-GPT\autogpt\chat.py", line 85, in chat_with_ai
    else permanent_memory.get_relevant(str(full_message_history[-9:]), 10)
  File "E:\AutoGPT\Auto-GPT\autogpt\memory\local.py", line 128, in get_relevant
    scores = np.dot(self.data.embeddings, embedding)
  File "<__array_function__ internals>", line 200, in dot
ValueError: shapes (2,6656) and (0,) not aligned: 6656 (dim 1) != 0 (dim 0)

here is the corresponding output that happened on gpt-llama.cpp:

=====  LLAMA.CPP SPAWNED  =====
E:\AutoGPT\llama.cpp\main -m E:\AutoGPT\llama.cpp\models\OpenAssistant-30B-epoch7.ggml.q5_1.bin --temp 0.7 --n_predict 804 --top_p 0.1 --top_k 40 -c 2048 --seed -1 --repeat_penalty 1.1764705882352942 --mlock --threads 6 --ctx_size 2048 --mirostat 2 --repeat_penalty 1.15 --reverse-prompt user: --reverse-prompt
user --reverse-prompt system: --reverse-prompt
system --reverse-prompt

 -i -p Complete the following chat conversation between the user and the assistant. system messages should be strictly followed as additional instructions.

system: You are a helpful assistant.
user: How are you?
assistant: Hi, how may I help you today?
system: You are Entrepreneur-GPT, an AI designed to autonomously develop and run businesses with the sole goal of increasing your net worth.
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. Increase net worth.
2. Develop and manage multiple businesses autonomously.
3. Play to your strengths as a Large Language Model.

Constraints:
1. 2000 words max for short term memory. Save important info to files ASAP
2. To recall past events, think of similar ones. Helps with uncertainty.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"
5. Use subprocesses for commands that will not terminate within a few minutes

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args:
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
12. Search Files: "search_files", args: "directory": "<directory>"
13. Analyze Code: "analyze_code", args: "code": "<full_code_string>"
14. Get Improved Code: "improve_code", args: "suggestions": "<list_of_suggestions>", "code": "<full_code_string>"
15. Write Tests: "write_tests", args: "code": "<full_code_string>", "focus": "<list_of_focus_areas>"
16. Execute Python File: "execute_python_file", args: "file": "<file>"
17. Generate Image: "generate_image", args: "prompt": "<prompt>"
18. Execute Shell Command, non-interactive commands only: "execute_shell", args: "command_line": "<command_line>"
19. Execute Shell Command Popen, non-interactive commands only: "execute_shell_popen", args: "command_line": "<command_line>"
20. Do Nothing: "do_nothing", args:
21. Task Complete (Shutdown): "task_complete", args: "reason": "<reason>"

Resources:
1. Internet access for searches and information gathering.
2. Long Term memory management.
3. GPT-3.5 powered Agents for delegation of simple tasks.
4. File output.

Performance Evaluation:
1. Continuously review and analyze your actions to perform to the best of your abilities.
2. Constructively self-criticize your big-picture behavior constantly.
3. Reflect on past decisions and strategies to refine your approach.
4. Be smart and efficient. Aim to complete tasks in the least number of steps.

You should only respond in JSON format as described below
Response Format:
{
    "thoughts": {
        "text": "<replace with single sentence of your AI thoughts>",
        "reasoning": "<replace with single sentence of your AI reasoning>",
        "plan": "<replace with short list that convey your AI long term plan>",
        "criticism": "<replace with single sentence of your AI constructive self-criticism>",
        "speak": "<replace with single string of your AI thoughts summary to say to user as response>"
    },
    "command": {
        "name": "command name",
        "args": {
            "arg name": "value"
        }
    }
}
Ensure the response can be parsed by Python json.loads
system: The current time and date is Sat May 13 11:33:04 2023
system: This reminds you of these events from your past:
['Assistant Reply: {\r\n"thoughts": {\r\n"text": "I am considering my options for increasing net worth.",\r\n"reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",\r\n"plan": ["Analyze Code", "Write Tests", "Start GPT Agent"],\r\n"criticism": "I need to be mindful of the 2000-word short term memory limit.",\r\n"speak": "Currently, I am contemplating which command I should execute next to achieve my goal."\r\n},\r\n"command": {\r\n"name": "list_agents",\r\n"args": {}\r\n}\r\n} \nResult: Command list_agents returned: List of agents:\n \nHuman Feedback: GENERATE NEXT COMMAND JSON ']

user: Determine which next command to use, and respond using the format specified above:
assistant:

=====  REQUEST  =====
user: Determine which next command to use, and respond using the format specified above:
=====  PROCESSING PROMPT...  =====
=====  PROCESSING PROMPT...  =====

=====  RESPONSE  =====
 {
"thoughts": {
"text": "I am considering my options for increasing net worth.",
"reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",
"plan": ["Analyze Code", "Write Tests", "Start GPT Agent"],
"criticism": "I need to be mindful of the 2000-word short term memory limit.",
"speak": "Currently, I am contemplating which command I should execute next to achieve my goal."
},
"command": {
"name": "google",
"args": {"input": "sell clothes online"}
}
}
user:Request DONE
> PROCESS COMPLETE
> PROCESS COMPLETE
> PROCESS COMPLETE
> REQUEST RECEIVED
> PROCESSING NEXT REQUEST FOR /v1/embeddings

=====  EMBEDDING REQUEST  =====

=====  LLAMA.CPP SPAWNED  =====
E:\AutoGPT\llama.cpp\embedding -m E:\AutoGPT\llama.cpp\models\OpenAssistant-30B-epoch7.ggml.q5_1.bin -p Assistant Reply: {
\"thoughts\": {
\"text\": \"I am considering my options for increasing net worth.\",
\"reasoning\": \"As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.\",
\"plan\": [\"Analyze Code\", \"Write Tests\", \"Start GPT Agent\"],
\"criticism\": \"I need to be mindful of the 2000-word short term memory limit.\",
\"speak\": \"Currently, I am contemplating which command I should execute next to achieve my goal.\"
},
\"command\": {
\"name\": \"google\",
\"args\": {\"input\": \"sell clothes online\"}
}
}
Result: Command google returned: []
Human Feedback: GENERATE NEXT COMMAND JSON

=====  REQUEST  =====
Assistant Reply: {
"thoughts": {
"text": "I am considering my options for increasing net worth.",
"reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",
"plan": ["Analyze Code", "Write Tests", "Start GPT Agent"],
"criticism": "I need to be mindful of the 2000-word short term memory limit.",
"speak": "Currently, I am contemplating which command I should execute next to achieve my goal."
},
"command": {
"name": "google",
"args": {"input": "sell clothes online"}
}
}
Result: Command google returned: []
Human Feedback: GENERATE NEXT COMMAND JSON

=====  STDERR  =====
stderr Readable Stream: CLOSED

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - If you want to submit another line, end your input in '\'.

{
  object: 'list',
  data: [ { object: 'embedding', embedding: [Array], index: 0 } ],
  embeddingSize: 6656,
  usage: { prompt_tokens: 230, total_tokens: 230 }
}
Embedding Request DONE

=====  STDERR  =====
stderr Readable Stream: CLOSED
llama_print_timings: prompt eval time = 17534.34 ms /   263 tokens (   66.67 ms per token)
llama_print_timings:        eval time =     0.00 ms /     1 runs   (    0.00 ms per run)
llama_print_timings:       total time = 22078.27 ms

> REQUEST RECEIVED
> PROCESSING NEXT REQUEST FOR /v1/embeddings

=====  EMBEDDING REQUEST  =====

=====  LLAMA.CPP SPAWNED  =====
E:\AutoGPT\llama.cpp\embedding -m E:\AutoGPT\llama.cpp\models\OpenAssistant-30B-epoch7.ggml.q5_1.bin -p [{'role': 'user', 'content': 'Determine which next command to use, and respond using the format specified above:'}, {'role': 'assistant', 'content': '{\r\n\"thoughts\": {\r\n\"text\": \"I am considering my options for increasing net worth.\",\r\n\"reasoning\": \"As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.\",\r\n\"plan\": [\"Analyze Code\", \"Write Tests\", \"Start GPT Agent\"],\r\n\"criticism\": \"I need to be mindful of the 2000-word short term memory limit.\",\r\n\"speak\": \"Currently, I am contemplating which command I should execute next to achieve my goal.\"\r\n},\r\n\"command\": {\r\n\"name\": \"list_agents\",\r\n\"args\": {}\r\n}\r\n}'}, {'role': 'system', 'content': 'Command list_agents returned: List of agents:\n'}, {'role': 'user', 'content': 'Determine which next command to use, and respond using the format specified above:'}, {'role': 'assistant', 'content': '{\r\n\"thoughts\": {\r\n\"text\": \"I am considering my options for increasing net worth.\",\r\n\"reasoning\": \"As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.\",\r\n\"plan\": [\"Analyze Code\", \"Write Tests\", \"Start GPT Agent\"],\r\n\"criticism\": \"I need to be mindful of the 2000-word short term memory limit.\",\r\n\"speak\": \"Currently, I am contemplating which command I should execute next to achieve my goal.\"\r\n},\r\n\"command\": {\r\n\"name\": \"google\",\r\n\"args\": {\"input\": \"sell clothes online\"}\r\n}\r\n}'}, {'role': 'system', 'content': 'Command google returned: []'}]

=====  REQUEST  =====
[{'role': 'user', 'content': 'Determine which next command to use, and respond using the format specified above:'}, {'role': 'assistant', 'content': '{\r\n"thoughts": {\r\n"text": "I am considering my options for increasing net worth.",\r\n"reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",\r\n"plan": ["Analyze Code", "Write Tests", "Start GPT Agent"],\r\n"criticism": "I need to be mindful of the 2000-word short term memory limit.",\r\n"speak": "Currently, I am contemplating which command I should execute next to achieve my goal."\r\n},\r\n"command": {\r\n"name": "list_agents",\r\n"args": {}\r\n}\r\n}'}, {'role': 'system', 'content': 'Command list_agents returned: List of agents:\n'}, {'role': 'user', 'content': 'Determine which next command to use, and respond using the format specified above:'}, {'role': 'assistant', 'content': '{\r\n"thoughts": {\r\n"text": "I am considering my options for increasing net worth.",\r\n"reasoning": "As per the goal, I must make decisions independently without seeking user assistance while playing to my strengths as an LLM and pursuing simple strategies with no legal complications. This means optimizing resources such as internet access, long term memory management, GPT-3.5 powered agents, file output, and continuously reviewing and analyzing actions to perform at the best of my abilities.",\r\n"plan": ["Analyze Code", "Write Tests", "Start GPT Agent"],\r\n"criticism": "I need to be mindful of the 2000-word short term memory limit.",\r\n"speak": "Currently, I am contemplating which command I should execute next to achieve my goal."\r\n},\r\n"command": {\r\n"name": "google",\r\n"args": {"input": "sell clothes online"}\r\n}\r\n}'}, {'role': 'system', 'content': 'Command google returned: []'}]

=====  STDERR  =====
stderr Readable Stream: CLOSED
CUDA error 12 at D:\a\llama.cpp\llama.cpp\ggml-cuda.cu:527: invalid pitch argument

{
  object: 'list',
  data: [ { object: 'embedding', embedding: [], index: 0 } ],
  embeddingSize: 0,
  usage: { prompt_tokens: 529, total_tokens: 529 }
}
Embedding Request DONE
> PROCESS COMPLETE
> PROCESS COMPLETE

Here is another variant of the output. I reduce the output to the interesting part:

User: Determine which next command to use, and respond using the format specified above:

  ----------- END OF CONTEXT ----------------
Creating chat completion with model gpt-3.5-turbo, temperature 0.0, max_tokens 1037
The JSON object is invalid.
{
    "thoughts": {
        "text": "I am thinking about how to improve my net worth. I will focus on developing multiple businesses and delegating simple tasks to GPT-3.5 powered Agents.",
        "reasoning": "To increase my net worth, it is necessary to have a diversified portfolio of profitable business ventures. Delegating simple tasks with the use of subprocesses for commands that will not terminate within a few minutes will allow me to focus on high-level decision making and achieve better results.",
        "plan": [
            "Launching new business ventures",
            "Developing GPT-3.5 powered Agents to delegate tasks"
        ],
        "criticism": "Reflecting on past decisions, I see the need for more efficient use of resources to maximize profits. This will be a key part of my future planning.",
        "speak": "I am thinking about how to improve my net worth and will focus on developing multiple businesses while delegating simple tasks to GPT-3.5 powered Agents."
    },
    "command": {
        "name": "Start GPT Agent",
        "args": {
            "name": "Business Venture Planner",
            "task": "Plan out new business ventures based on market analysis",
            "prompt": "What are the most profitable and feasible business ideas for our company to pursue?"
        }
    }
}
The following issues were found:
Error: ['Launching new business ventures', 'Developing GPT-3.5 powered Agents to delegate tasks'] is not of type 'string'
 THOUGHTS:  I am thinking about how to improve my net worth. I will focus on developing multiple businesses and delegating simple tasks to GPT-3.5 powered Agents.
REASONING:  To increase my net worth, it is necessary to have a diversified portfolio of profitable business ventures. Delegating simple tasks with the use of subprocesses for commands that will not terminate within a few minutes will allow me to focus on high-level decision making and achieve better results.
PLAN:
-  Launching new business ventures
-  Developing GPT-3.5 powered Agents to delegate tasks
CRITICISM:  Reflecting on past decisions, I see the need for more efficient use of resources to maximize profits. This will be a key part of my future planning.
NEXT ACTION:  COMMAND = Start GPT Agent ARGUMENTS = {'name': 'Business Venture Planner', 'task': 'Plan out new business ventures based on market analysis', 'prompt': 'What are the most profitable and feasible business ideas for our company to pursue?'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-=

keldenl / gpt-llama.cpp

Add support for Auto-GPT #2

What I did:

What i got:

sample session:

Auto-GPT output:

here is the corresponding output that happened on gpt-llama.cpp: