letta-ai / letta

Letta (formerly MemGPT) is a framework for creating LLM services with memory.
https://letta.com
Apache License 2.0
13.05k stars 1.43k forks source link

requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://inference.memgpt.ai/chat/completion #1549

Open scenaristeur opened 4 months ago

scenaristeur commented 4 months ago

I have tried many conversation with memGPT agent using https://inference.memgpt.ai .

It sounds good, at the beginning But at some point of the conversation, after about 10 or 20 exchange between user and agent, it falls with a 4xx/5xx error

If i start another conversation from scratch, it is ok at the beginning but crashes same way at some point. Is there a limitation of query per minute ( i don't think so , because if i wait for 2 min , my conversation is broken, and i can get something else than this error)

here is my config

cat ~/.memgpt/config 
[defaults]
preset = memgpt_chat
persona = sam_pov
human = basic

[model]
model = memgpt-openai
model_endpoint = https://inference.memgpt.ai
model_endpoint_type = openai
context_window = 8192

[embedding]
embedding_endpoint_type = hugging-face
embedding_endpoint = https://embeddings.memgpt.ai
embedding_model = BAAI/bge-large-en-v1.5
embedding_dim = 1024
embedding_chunk_size = 300

[archival_storage]
type = postgres
path = /home/smag/.memgpt/chroma
uri = postgresql+pg8000://memgpt:memgpt@localhost:5432/memgpt

[recall_storage]
type = postgres
path = /home/smag/.memgpt
uri = postgresql+pg8000://memgpt:memgpt@localhost:5432/memgpt

[metadata_storage]
type = postgres
path = /home/smag/.memgpt
uri = postgresql+pg8000://memgpt:memgpt@localhost:5432/memgpt

[version]
memgpt_version = 0.3.19

[client]
anon_clientid = 00000000-0000-0000-0000-000000000000

and here is the crash


NFO:     127.0.0.1:41156 - "POST /api/agents/ecd4b98c-456d-44cf-8902-849dca9747a7/messages HTTP/1.1" 200 OK
INFO:     127.0.0.1:34314 - "POST /api/agents/ecd4b98c-456d-44cf-8902-849dca9747a7/messages HTTP/1.1" 200 OK
[HTTP] launching GET request to https://spoggy-test2.solidcommunity.net/public/brains/Chateau_des_Robots/
archival_memories: [{'id': '5e1e9092-8379-1167-092e-bf5654ac26f3', 'contents': "Le 17 juillet 2024, j'ai eu une conversation avec l'utilisateur au sujet du container de Pod Solid à l'adresse suivante : https://spoggy-test2.solidcommunity.net/public/brains/Chateau_des_Robots/. J'ai récupéré et interprété les données en format Turtle et fourni des liens vers ces données. Cela pourrait être pertinent pour les interactions futures."}]
INFO:     127.0.0.1:33098 - "GET /api/agents/ecd4b98c-456d-44cf-8902-849dca9747a7/archival/all HTTP/1.1" 200 OK
INFO:     127.0.0.1:39098 - "POST /api/agents/ecd4b98c-456d-44cf-8902-849dca9747a7/messages HTTP/1.1" 200 OK
INFO:     127.0.0.1:45254 - "POST /api/agents/ecd4b98c-456d-44cf-8902-849dca9747a7/messages HTTP/1.1" 200 OK
[HTTP] launching GET request to https://spoggy-test2.solidcommunity.net/public/brains/Chateau_des_Robots/710924c4-26d0-45f4-8e26-8a3e9bcee1a3
Task exception was never retrieved
future: <Task finished name='Task-41' coro=<to_thread() done, defined at /usr/lib/python3.11/asyncio/threads.py:12> exception=HTTPError('500 Server Error: Internal Server Error for url: https://inference.memgpt.ai/chat/completions')>
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/server/server.py", line 602, in user_message
    usage = self._step(user_id=user_id, agent_id=agent_id, input_message=packaged_user_message, timestamp=timestamp)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/server/server.py", line 391, in _step
    new_messages, heartbeat_request, function_failed, token_warning, usage = memgpt_agent.step(
                                                                             ^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/agent.py", line 722, in step
    raise e
  File "/home/smag/dev/MemGPT/memgpt/agent.py", line 646, in step
    response = self._get_ai_reply(
               ^^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/agent.py", line 345, in _get_ai_reply
    raise e
  File "/home/smag/dev/MemGPT/memgpt/agent.py", line 320, in _get_ai_reply
    response = create(
               ^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/llm_api/llm_api_tools.py", line 106, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/llm_api/llm_api_tools.py", line 212, in create
    response = openai_chat_completions_request(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/memgpt/llm_api/openai.py", line 399, in openai_chat_completions_request
    raise http_err
  File "/home/smag/dev/MemGPT/memgpt/llm_api/openai.py", line 389, in openai_chat_completions_request
    response.raise_for_status()  # Raises HTTPError for 4XX/5XX status
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/smag/dev/MemGPT/.venv/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://inference.memgpt.ai/chat/completions
scenaristeur commented 4 months ago

to avoid this error, i have changed agent.py line 710 to use self.summarize_messages_inplace() . and now it works . it seems that the messages send were too long

        except Exception as e:
            printd(f"step() failed\nuser_message = {user_message}\nerror = {e}")

            # If we got a context alert, try trimming the messages length, then try again
            #if is_context_overflow_error(e):
                # A separate API call to run a summarizer

            self.summarize_messages_inplace()

                # Try step again
            return self.step(user_message, first_message=first_message, return_dicts=return_dicts)
            #else:
            #    printd(f"step() failed with an unrecognized exception: '{str(e)}'")
            #    raise e
quantumalchemy commented 4 months ago

to avoid this error, i have changed agent.py line 710 to use self.summarize_messages_inplace() . and now it works . it seems that the messages send were too long

        except Exception as e:
            printd(f"step() failed\nuser_message = {user_message}\nerror = {e}")

            # If we got a context alert, try trimming the messages length, then try again
            #if is_context_overflow_error(e):
                # A separate API call to run a summarizer

            self.summarize_messages_inplace()

                # Try step again
            return self.step(user_message, first_message=first_message, return_dicts=return_dicts)
            #else:
            #    printd(f"step() failed with an unrecognized exception: '{str(e)}'")
            #    raise e

did you write a PR for this? .. the free inference.memgpt.ai made available since the beginning I really appreciate as do others with limited gpu/ compute power .. but the experience you describe has been going on since the beginning. please make a pr - Thanks for finding a pos solution .. will be testing it out

quantumalchemy commented 4 months ago

wow this little fix really worked! Thanks!

scenaristeur commented 4 months ago

Hi @quantumalchemy thxs for your feedback of my fix. ;-)

it seems that the only error catched making a summarize is "if is_context_overflow_error(e):" but the inference process return a 404/500 error, when a too long message is send to the inference server that is not catched. this way if i get a 400/500 error i try a summarization and retry the step. but if it is another error it does it too, so could possibly loop with infinity run and potentialy infinity api consumption if we does not take care. perharps @sarahwooders or @cpacker could find a better way to summarize, or treat this 400/500 errors that are in fact context overflow errors

quantumalchemy commented 4 months ago

thanks again for figuring this out .. I think your patch here only works for the memgpt llm endpoint so I created a fork .. testing it works on latest v. + patch was getting --> Failed to put inner thoughts in kwargs: Invalid control character at: line 2 column 270 (char 271) without patch -- but like you said .. goes a little slower because of some extra looping.. but knock wood -- not on error.