Cannot POST /V1/embeddings

I'm running AutoGPT + gpt-llama.cpp with the latest version from gpt-llama.cpp and autoGPT version suggested in this guide, i can start communicating with the model but once it gets to the action phase, it crashes AutoGPT with a html response suggesting it can't post to /v1/embeddings. I'm not sure whether i should be raising this with AutoGPT or gpt-llama.cpp repo.

gpt-llama.cpp:

Server is listening on:
  - http://localhost:443
  - http://10.0.0.10:443 (for other devices on the same network)

See Docs
  - http://localhost:443/docs

Test your installation
  - double click the scripts/test-installation.ps1 (powershell) or scripts/test-installation.bat (cmd) file

See https://github.com/keldenl/gpt-llama.cpp#usage for more guidance.
> REQUEST RECEIVED
> PROCESSING NEXT REQUEST FOR /V1/chat/completions
> LLAMA.CPP DETECTED

=====  CHAT COMPLETION REQUEST  =====
> VICUNA MODEL DETECTED. LOADING VICUNA ENGINE...
{ '--n_predict': 1021 }

=====  LLAMA.CPP SPAWNED  =====
..\llama.cpp\main -m ..\llama.cpp\Models\Wizard-Vicuna-13B-Uncensored.ggmlv3.q4_1.bin --threads 12 --temp 0.7 --n_predict 1021 --top_p 0.1 --top_k 40 -c 2048 --seed -1 --repeat_penalty 1.1764705882352942 --reverse-prompt user: --reverse-prompt
user --reverse-prompt system: --reverse-prompt
system --reverse-prompt

 --reverse-prompt ## --reverse-prompt
## --reverse-prompt ### --reverse-prompt

 -i -p Complete the following chat conversation between the Human and the Assistant. System messages should be strictly followed as additional instructions.

### System: You are a helpful assistant.
### Human: How are you?
### Assistant: Hi, how may I help you today?
### System: You are Bob, An AI designed to assist on requests and research autonomously
Your decisions must always be made independently without seeking user assistance. Play to your strengths as an LLM and pursue simple strategies with no legal complications.

GOALS:

1. find from 1 to 10 tech products released last week on Amazon with a spike in ratings
2. check reddit and youtube for positive feedback

Constraints:
1. ~4000 word limit for short term memory. Your short term memory is short, so immediately save important information to files.
2. If you are unsure how you previously did something or want to recall past events, thinking about similar events will help you remember.
3. No user assistance
4. Exclusively use the commands listed in double quotes e.g. "command name"
5. Use subprocesses for commands that will not terminate within a few minutes

Commands:
1. Google Search: "google", args: "input": "<search>"
2. Browse Website: "browse_website", args: "url": "<url>", "question": "<what_you_want_to_find_on_website>"
3. Start GPT Agent: "start_agent", args: "name": "<name>", "task": "<short_task_desc>", "prompt": "<prompt>"
4. Message GPT Agent: "message_agent", args: "key": "<key>", "message": "<message>"
5. List GPT Agents: "list_agents", args:
6. Delete GPT Agent: "delete_agent", args: "key": "<key>"
7. Clone Repository: "clone_repository", args: "repository_url": "<url>", "clone_path": "<directory>"
8. Write to file: "write_to_file", args: "file": "<file>", "text": "<text>"
9. Read file: "read_file", args: "file": "<file>"
10. Append to file: "append_to_file", args: "file": "<file>", "text": "<text>"
11. Delete file: "delete_file", args: "file": "<file>"
### System: The current time and date is Tue May 30 19:00:41 2023
### System: This reminds you of these events from your past:

### Human: Determine which next command to use, and respond using the format specified above:
### Assistant:

=====  REQUEST  =====
### Human: Determine which next command to use, and respond using the format specified above:
=====  PROCESSING PROMPT...  =====
=====  PROCESSING PROMPT...  =====

=====  RESPONSE  =====
 Based on the given goals and constraints, I recommend starting with a search for tech products released last week on Amazon. Here's my response in JSON format:
{
 "command": {
 "name": "google",
 "args": {
 "input": "1 to 10 tech products released last week on Amazon with a spike in ratings"
 }
 },
 "thoughts": {
 "speak": "I will start by searching for the specified tech products.",
 "plan": "- Search for tech products released last week on Amazon with a spike in ratings",
 "reasoning": "This is an important task that needs to be completed quickly. I will use Google's search engine to find relevant information."
 }
}
user:Request DONE
> PROCESS COMPLETE
> REQUEST RECEIVED
> PROCESSING NEXT REQUEST FOR /V1/embeddings
> PROCESS COMPLETE

AutoGPT:

 THOUGHTS:  None
REASONING:  This is an important task that needs to be completed quickly. I will use Google's search engine to find relevant information.
PLAN:
-  Search for tech products released last week on Amazon with a spike in ratings
CRITICISM:  None
NEXT ACTION:  COMMAND = google ARGUMENTS = {'input': '1 to 10 tech products released last week on Amazon with a spike in ratings'}
Enter 'y' to authorise command, 'y -N' to run N continuous commands, 'n' to exit program, or enter feedback for ...
Input:y
-=-=-=-=-=-=-= COMMAND AUTHORISED BY USER -=-=-=-=-=-=-= 
D:\Programs Files\Python\Python311\Lib\site-packages\duckduckgo_search\compat.py:20: UserWarning: ddg is deprecated. Use DDGS().text() generator
  warnings.warn("ddg is deprecated. Use DDGS().text() generator")
D:\Programs Files\Python\Python311\Lib\site-packages\duckduckgo_search\compat.py:24: UserWarning: parameter page is deprecated, use DDGS().text() generator
  warnings.warn("parameter page is deprecated, use DDGS().text() generator")
D:\Programs Files\Python\Python311\Lib\site-packages\duckduckgo_search\compat.py:26: UserWarning: parameter max_results is deprecated, use DDGS().text()
  warnings.warn("parameter max_results is deprecated, use DDGS().text()")
rbody = <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot POST /V1/embeddings</pre>
</body>
</html>

Traceback (most recent call last):
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\openai\api_requestor.py", line 675, in _interpret_response_line
    data = json.loads(rbody)
           ^^^^^^^^^^^^^^^^^
  File "D:\Programs Files\Python\Python311\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs Files\Python\Python311\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Programs Files\Python\Python311\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "D:\projecto\Auto-GPT\autogpt\__main__.py", line 5, in <module>
    autogpt.cli.main()
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1635, in invoke
    rv = super().invoke(ctx)
         ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\click\decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projecto\Auto-GPT\autogpt\cli.py", line 151, in main
    agent.start_interaction_loop()
  File "D:\projecto\Auto-GPT\autogpt\agent\agent.py", line 184, in start_interaction_loop
    self.memory.add(memory_to_add)
  File "D:\projecto\Auto-GPT\autogpt\memory\local.py", line 78, in add
    embedding = create_embedding_with_ada(text)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projecto\Auto-GPT\autogpt\llm_utils.py", line 155, in create_embedding_with_ada
    return openai.Embedding.create(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\openai\api_resources\embedding.py", line 33, in create
    response = super().create(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\openai\api_resources\abstract\engine_api_resource.py", line 153, in create
    response, _, api_key = requestor.request(
                           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\openai\api_requestor.py", line 226, in request
    resp, got_stream = self._interpret_response(result, stream)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\openai\api_requestor.py", line 621, in _interpret_response
    self._interpret_response_line(
  File "C:\Users\Terramoto\AppData\Roaming\Python\Python311\site-packages\openai\api_requestor.py", line 677, in _interpret_response_line
    raise error.APIError(
openai.error.APIError: HTTP code 404 from API (<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot POST /V1/embeddings</pre>
</body>
</html>
)
Press any key to continue . . .

keldenl / gpt-llama.cpp

Cannot POST /V1/embeddings #52