jakobdylanc / llmcord.py

A Discord LLM chat bot that supports any OpenAI compatible API (OpenAI, xAI, Mistral, Groq, OpenRouter, ollama, LM Studio and more)
MIT License
344 stars 61 forks source link

Oobabooga Support #13

Closed SODAsoo07 closed 8 months ago

SODAsoo07 commented 9 months ago

Hello! Do you have any plans to support Oobaboga, other than LM Studio when you use the local model?

jakobdylanc commented 9 months ago

Good point - it should already support it. Give it a try (on my latest commit) and let me know if there's any issues.

Steel-skull commented 9 months ago

Good point - it should already support it. Give it a try (on my latest commit) and let me know if there's any issues.

here is the error im receiving.

2024-01-30 15:28:30.235 INFO: HTTP Request: POST http://192.168.2.2:5000/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-30 15:28:30.236 ERROR: Ignoring exception in on_message
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
    yield
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 209, in _receive_event
    event = self._h11_state.next_event()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/h11/_connection.py", line 469, in next_event
    event = self._extract_next_receive_event()
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/h11/_connection.py", line 419, in _extract_next_receive_event
    event = self._reader.read_eof()  # type: ignore[attr-defined]
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/h11/_readers.py", line 204, in read_eof
    raise RemoteProtocolError(
h11._util.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
    yield
  File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 252, in __aiter__
    async for part in self._httpcore_stream:
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 361, in __aiter__
    async for part in self._stream:
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 337, in __aiter__
    raise exc
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 329, in __aiter__
    async for chunk in self._connection._receive_response_body(**kwargs):
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 198, in _receive_response_body
    event = await self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 208, in _receive_event
    with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
  File "/opt/conda/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/conda/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/discord/client.py", line 441, in _run_event
    await coro(*args, **kwargs)
  File "/home/jovyan/work/Oobabot/discord-llm-chatbot/llmcord.py", line 165, in on_message
    async for chunk in await llm_client.chat.completions.create(
  File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 116, in __aiter__
    async for item in self._iterator:
  File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 129, in __stream__
    async for sse in iterator:
  File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 120, in _iter_events
    async for sse in self._decoder.aiter(self.response.aiter_lines()):
  File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 231, in aiter
    async for line in iterator:
  File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 967, in aiter_lines
    async for text in self.aiter_text():
  File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 954, in aiter_text
    async for byte_content in self.aiter_bytes():
  File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 933, in aiter_bytes
    async for raw_bytes in self.aiter_raw():
  File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 991, in aiter_raw
    async for raw_stream_bytes in self.stream:
  File "/opt/conda/lib/python3.11/site-packages/httpx/_client.py", line 147, in __aiter__
    async for chunk in self._stream:
  File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 251, in __aiter__
    with map_httpcore_exceptions():
  File "/opt/conda/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
jakobdylanc commented 9 months ago

What specific model are you using? Does the error happen every time?

Steel-skull commented 9 months ago

What specific model are you using? Does the error happen every time?

im using a custom model I made called etheria-55b-v0.1 its loaded using exl2 on ooba as a back end and so far i cannot get a response from the bot. the same api is used to serve silly tavern when i have it loaded in a container. i can access it through a local ip

jakobdylanc commented 9 months ago

Try with a more standard model like llama2 and see what happens. It may be that your custom model is formatting its streamed responses improperly. I'd need access to your custom model so I can reproduce and try to debug on my end.

Steel-skull commented 9 months ago

ohhh ill try that and here is the hugging face link https://huggingface.co/Steelskull/Etheria-55b-v0.1

the model is designed to use alpaca or chatml

thanks for the reply!

Steel-skull commented 9 months ago

Try with a more standard model like llama2 and see what happens. It may be that your custom model is formatting its streamed responses improperly. I'd need access to your custom model so I can reproduce and try to debug on my end.

i tried with both a llama 2 and mistral model, with no luck so far

jakobdylanc commented 9 months ago

I tried with llama-2-7b-chat.Q4_K_M.gguf from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

It's working fine for me. I'm using oobagooba local API with URL set to http://localhost:5000/v1.

Just as a sanity check, make sure all your stuff is up to date (oobagooba, openai python package, etc.).

Besides that I can't do much until I reproduce the error on my end. The more info you provide the better so I can keep trying.

Steel-skull commented 9 months ago

I tried with llama-2-7b-chat.Q4_K_M.gguf from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

It's working fine for me. I'm using oobagooba local API with URL set to http://localhost:5000/v1.

Just as a sanity check, make sure all your stuff is up to date (oobagooba, openai python package, etc.).

Besides that I can't do much until I reproduce the error on my end. The more info you provide the better so I can keep trying.

i appreciate the help and yea im thinking it was an issue of some kind on my end, ive updated all associated packages. ive even installed the repo on the docker container that has ooba installed then ran the script with http://localhost:/5000/v1 and im getting the same issue. hmmm that means its gotta be a ooba issue... probably

jakobdylanc commented 9 months ago

No problem, keep me posted on your progress.

A (maybe helpful) side note: I actually did encounter your exact error ONCE while using mistral-medium from Mistral API (NOT a local model). I posted about it in Mistral's official Discord server: image

Again, the error only happened once during a random streamed response and then it kept working fine after that. A Mistral dev replied but didn't know what to make of it: image

jakobdylanc commented 9 months ago

Another idea - try this simple streamed response example from oobagooba wiki and see if the error still happens: https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#python-chat-example-with-streaming

Maybe it's something with the openai python package?

Steel-skull commented 9 months ago

ok so after adding a crapton of http error logic and retry logic i finally got it to kick out an error on oobas side altho im not the best at api and server side things can you make anything of this.

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 247, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
    await func()
  File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 191, in listen_for_disconnect
    message = await receive()
  File "/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 587, in receive
    await self.message_event.wait()
  File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 1537e0d7a8c0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/applications.py", line 116, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
    raise exc
  File "/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 746, in __call__
    await route.handle(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
    await self.app(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 75, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
    raise exc
  File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
    await app(scope, receive, sender)
  File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
    await response(scope, receive, send)
  File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 233, in __call__
    async with anyio.create_task_group() as task_group:
  File "/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
    raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
jakobdylanc commented 9 months ago

Not sure what to make of that. Did you try what I suggested above? AKA try running the simple streamed response example that doesn't use the openai python package.

Steel-skull commented 9 months ago

Another idea - try this simple streamed response example from oobagooba wiki and see if the error still happens: https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#python-chat-example-with-streaming

Maybe it's something with the openai python package?

Same style of error: im going to try to eliminate ooba and try tabbyapi

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:761, in HTTPResponse._update_chunk_length(self)
    760 try:
--> 761     self.chunk_left = int(line, 16)
    762 except ValueError:
    763     # Invalid chunked protocol response, abort.

ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

InvalidChunkLength                        Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:444, in HTTPResponse._error_catcher(self)
    443 try:
--> 444     yield
    446 except SocketTimeout:
    447     # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but
    448     # there is yet no clean way to get at it from this context.

File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:828, in HTTPResponse.read_chunked(self, amt, decode_content)
    827 while True:
--> 828     self._update_chunk_length()
    829     if self.chunk_left == 0:

File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:765, in HTTPResponse._update_chunk_length(self)
    764 self.close()
--> 765 raise InvalidChunkLength(self, line)

InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:

ProtocolError                             Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/requests/models.py:816, in Response.iter_content.<locals>.generate()
    815 try:
--> 816     yield from self.raw.stream(chunk_size, decode_content=True)
    817 except ProtocolError as e:

File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:624, in HTTPResponse.stream(self, amt, decode_content)
    623 if self.chunked and self.supports_chunked_reads():
--> 624     for line in self.read_chunked(amt, decode_content=decode_content):
    625         yield line

File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:816, in HTTPResponse.read_chunked(self, amt, decode_content)
    811     raise BodyNotHttplibCompatible(
    812         "Body should be http.client.HTTPResponse like. "
    813         "It should have have an fp attribute which returns raw chunks."
    814     )
--> 816 with self._error_catcher():
    817     # Don't bother reading the body of a HEAD request.
    818     if self._original_response and is_response_to_head(self._original_response):

File /opt/conda/lib/python3.11/contextlib.py:155, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    154 try:
--> 155     self.gen.throw(typ, value, traceback)
    156 except StopIteration as exc:
    157     # Suppress StopIteration *unless* it's the same exception that
    158     # was passed to throw().  This prevents a StopIteration
    159     # raised inside the "with" statement from being suppressed.

File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:461, in HTTPResponse._error_catcher(self)
    459 except (HTTPException, SocketError) as e:
    460     # This includes IncompleteRead.
--> 461     raise ProtocolError("Connection broken: %r" % e, e)
    463 # If no exception is thrown, we should avoid cleaning up
    464 # unnecessarily.

ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

During handling of the above exception, another exception occurred:

ChunkedEncodingError                      Traceback (most recent call last)
Cell In[47], line 26
     23 client = sseclient.SSEClient(stream_response)
     25 assistant_message = ''
---> 26 for event in client.events():
     27     payload = json.loads(event.data)
     28     chunk = payload['choices'][0]['message']['content']

File /opt/conda/lib/python3.11/site-packages/sseclient/__init__.py:55, in SSEClient.events(self)
     54 def events(self):
---> 55     for chunk in self._read():
     56         event = Event()
     57         # Split before decoding so splitlines() only uses \r and \n

File /opt/conda/lib/python3.11/site-packages/sseclient/__init__.py:45, in SSEClient._read(self)
     38 """Read the incoming event source stream and yield event chunks.
     39 
     40 Unfortunately it is possible for some servers to decide to break an
     41 event into multiple HTTP chunks in the response. It is thus necessary
     42 to correctly stitch together consecutive response chunks and find the
     43 SSE delimiter (empty new line) to yield full, correct event chunks."""
     44 data = b''
---> 45 for chunk in self._event_source:
     46     for line in chunk.splitlines(True):
     47         data += line

File /opt/conda/lib/python3.11/site-packages/requests/models.py:818, in Response.iter_content.<locals>.generate()
    816     yield from self.raw.stream(chunk_size, decode_content=True)
    817 except ProtocolError as e:
--> 818     raise ChunkedEncodingError(e)
    819 except DecodeError as e:
    820     raise ContentDecodingError(e)

ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
jakobdylanc commented 9 months ago

Still confused why I'm not seeing this error with oobagooba though. What's different about our setups that's causing this?

Steel-skull commented 9 months ago

Still confused why I'm not seeing this error with oobagooba though. What's different about our setups that's causing this?

I have no idea but when I used tabbyapi I was receiving responses. So it's gotta be either docker (on my end) or ooba. I'll dig in more when I can.

Steel-skull commented 9 months ago

Biggest problem is I was unable to adjust the samplers due to needing to add **kwargs to the api call but open ai api doesn't support the full range or possible samplers.

I attempted to rewrite the api but I got wayyy out of my depth and couldn't figure out how the api calls were structured. (On tabbyapi end)

Aeriolisse commented 8 months ago

I was getting the same error as below.

h11._util.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

I fixed it by using oobabooga's example, it's currently working as of now here's the code below.

Note that this code may be outdated.

import asyncio
from datetime import datetime
import logging
import os

import discord
from dotenv import load_dotenv
#from openai import AsyncOpenAI

import requests
import sseclient  # pip install sseclient-py
import json

load_dotenv()
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s.%(msecs)03d %(levelname)s: %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

#this doesn't work, local only
LLM_CONFIG = {
    "gpt": {
        "api_key": os.environ["OPENAI_API_KEY"],
        "base_url": "https://api.openai.com/v1",
    },
    "mistral": {
        "api_key": os.environ["MISTRAL_API_KEY"],
        "base_url": "https://api.mistral.ai/v1",
    },
    "local": {
        "api_key": "Not used",
        "base_url": os.environ["LOCAL_SERVER_URL"],
    },
}
LLM_VISION_SUPPORT = "vision" in os.environ["LLM"]
MAX_COMPLETION_TOKENS = 1024

ALLOWED_CHANNEL_IDS = [int(i) for i in os.environ["ALLOWED_CHANNEL_IDS"].split(",") if i]
ALLOWED_ROLE_IDS = [int(i) for i in os.environ["ALLOWED_ROLE_IDS"].split(",") if i]
MAX_IMAGES = int(os.environ["MAX_IMAGES"]) if LLM_VISION_SUPPORT else 0
MAX_IMAGE_WARNING = f"⚠️ Max {MAX_IMAGES} image{'' if MAX_IMAGES == 1 else 's'} per message" if MAX_IMAGES > 0 else "⚠️ Can't see images"
MAX_MESSAGES = int(os.environ["MAX_MESSAGES"])
MAX_MESSAGE_WARNING = f"⚠️ Only using last {MAX_MESSAGES} messages"

EMBED_COLOR = {"incomplete": discord.Color.orange(), "complete": discord.Color.green()}
EMBED_MAX_LENGTH = 4096
EDITS_PER_SECOND = 1.3

#llm_client = AsyncOpenAI(**LLM_CONFIG[os.environ["LLM"].split("-", 1)[0]])
intents = discord.Intents.default()
intents.message_content = True
discord_client = discord.Client(intents=intents)

message_nodes = {}
in_progress_message_ids = []

class MessageNode:
    def __init__(self, message, too_many_images=False, replied_to=None):
        self.message = message
        self.too_many_images = too_many_images
        self.replied_to = replied_to

def get_system_prompt():
    return {
                "role": "system",
                "content": f"{os.environ['CUSTOM_SYSTEM_PROMPT']}\nUser's names are their Discord IDs and should be typed as '<@ID>'.\nToday's date: {datetime.now().strftime('%B %d %Y')}",
            }

url = "http://0.0.0.0:5000/v1/chat/completions"

headers = {
    "Content-Type": "application/json"
}

@discord_client.event
async def on_message(message):
    # Filter out unwanted messages
    if (
        (message.channel.type != discord.ChannelType.private and discord_client.user not in message.mentions)
        or (ALLOWED_CHANNEL_IDS and message.channel.id not in ALLOWED_CHANNEL_IDS)
        or (ALLOWED_ROLE_IDS and (message.channel.type == discord.ChannelType.private or not [role for role in message.author.roles if role.id in ALLOWED_ROLE_IDS]))
        or message.author.bot
    ):
        return

    # If user replied to a message that's still generating, wait until it's done
    while message.reference and message.reference.message_id in in_progress_message_ids:
        await asyncio.sleep(0)

    async with message.channel.typing():
        # Loop through message reply chain and create MessageNodes
        current_message = message
        previous_message_id = None
        while True:
            try:
                current_message_text = current_message.embeds[0].description if current_message.author == discord_client.user else current_message.content
                if current_message_text.startswith(discord_client.user.mention):
                    current_message_text = current_message_text[len(discord_client.user.mention) :].lstrip()
                current_message_content = current_message_text if current_message_text else ''
                current_message_images = [
                    {
                        "type": "image_url",
                        "image_url": {"url": att.url, "detail": "low"},
                    }
                    for att in current_message.attachments
                    if "image" in att.content_type
                ]
                #current_message_content += current_message_images[:MAX_IMAGES]
                if "mistral" in os.environ["LLM"]:
                    # Temporary fix until Mistral API supports message.content as a list
                    current_message_content = current_message_text
                current_message_role = "assistant" if current_message.author == discord_client.user else "user"
                message_nodes[current_message.id] = MessageNode(
                    {
                        "role": current_message_role,
                        "content": current_message_content,
                        "name": str(current_message.author.id)
                    }
                )
                if len(current_message_images) > MAX_IMAGES:
                    message_nodes[current_message.id].too_many_images = True
                if previous_message_id:
                    message_nodes[previous_message_id].replied_to = message_nodes[current_message.id]
                if not current_message.reference:
                    break
                if current_message.reference.message_id in message_nodes:
                    message_nodes[current_message.id].replied_to = message_nodes[current_message.reference.message_id]
                    break
                previous_message_id = current_message.id
                current_message = (
                    current_message.reference.resolved
                    if isinstance(current_message.reference.resolved, discord.Message)
                    else await message.channel.fetch_message(current_message.reference.message_id)
                )
            except (discord.NotFound, discord.HTTPException, IndexError):
                break

        # Build conversation history from reply chain and set user warnings
        reply_chain = []
        user_warnings = set()
        current_node = message_nodes[message.id]
        while current_node is not None and len(reply_chain) < MAX_MESSAGES:
            reply_chain += [current_node.message]
            if current_node.too_many_images:
                user_warnings.add(MAX_IMAGE_WARNING)
            if len(reply_chain) == MAX_MESSAGES and current_node.replied_to:
                user_warnings.add(MAX_MESSAGE_WARNING)
            current_node = current_node.replied_to
        #print("REPLY CHAIN")
        #print(reply_chain[::-1])
        #print(reply_chain[0])
        #print("REPLY CHAIN")
        messages = []
        messages.append(get_system_prompt())
        for msgs in reply_chain:
            messages.append(msgs)
        #print(messages)

        # Generate and send bot reply
        logging.info(f"Message received: {reply_chain[0]}, reply chain length: {len(reply_chain)}")
        response_messages = []
        response_message_contents = []
        previous_content = None
        edit_message_task = None
        #print(os.environ["LLM"])

        #copied from sillytavern request, mixtral settings
        data = {
            "mode": "instruct",
            "messages": messages,
            "stream": True,
            "max_tokens": MAX_COMPLETION_TOKENS,
            "max_new_tokens": 2048,
            "temperature": 0.99,
            "top_p": 1,
            "typical_p": 1,
            "min_p": 0.02,
            "repetition_penalty": 1,
            "frequency_penalty": 0,
            "presence_penalty": 0,
            "top_k": 0,
            "min_length": 0,
            "min_tokens": 0,
            "num_beams": 1,
            "length_penalty": 1,
            "early_stopping": False,
            "add_bos_token": True,
            "truncation_length": 4096,
            "ban_eos_token": False,
            "skip_special_tokens": True,
            "top_a": 0,
            "tfs": 1,
            "epsilon_cutoff": 0,
            "eta_cutoff": 0,
            "mirostat_mode": 0,
            "mirostat_tau": 5,
            "mirostat_eta": 0.1,
            "repetition_penalty_range": 600,
            "encoder_repetition_penalty": 1,
            "no_repeat_ngram_size": 0,
            "penalty_alpha": 0,
            "temperature_last": True,
            "seed": -1,
            "guidance_scale": 1
        }

        stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
        client = sseclient.SSEClient(stream_response)
        for chunk in client.events():
            payload = json.loads(chunk.data)
            current_content = payload['choices'][0]['message']['content'] or ""
            #print(current_content)
            if previous_content:
                if not response_messages or len(response_message_contents[-1] + previous_content) > EMBED_MAX_LENGTH:
                    reply_message = message if not response_messages else response_messages[-1]
                    embed = discord.Embed(description="⏳", color=EMBED_COLOR["incomplete"])
                    for warning in sorted(user_warnings):
                        embed.add_field(name=warning, value="", inline=False)
                    response_messages += [
                        await reply_message.reply(
                            embed=embed,
                            silent=True,
                        )
                    ]
                    in_progress_message_ids.append(response_messages[-1].id)
                    last_message_task_time = datetime.now().timestamp()
                    response_message_contents += [""]
                response_message_contents[-1] += previous_content
                final_message_edit = len(response_message_contents[-1] + current_content) > EMBED_MAX_LENGTH or current_content == ""
                if (
                    final_message_edit
                    or (not edit_message_task or edit_message_task.done())
                    and datetime.now().timestamp() - last_message_task_time >= len(in_progress_message_ids) / EDITS_PER_SECOND
                ):
                    while edit_message_task and not edit_message_task.done():
                        await asyncio.sleep(0)
                    if response_message_contents[-1].strip():
                        embed.description = response_message_contents[-1]
                    embed.color = EMBED_COLOR["complete"] if final_message_edit else EMBED_COLOR["incomplete"]
                    edit_message_task = asyncio.create_task(response_messages[-1].edit(embed=embed))
                    last_message_task_time = datetime.now().timestamp()
            previous_content = current_content

        # Create MessageNode(s) for bot reply message(s) (can be multiple if bot reply was long)
        for response_message in response_messages:
            message_nodes[response_message.id] = MessageNode(
                {
                    "role": "assistant",
                    "content": "".join(response_message_contents),
                    "name": str(discord_client.user.id),
                },
                replied_to=message_nodes[message.id],
            )
            in_progress_message_ids.remove(response_message.id)

async def main():
    await discord_client.start(os.environ["DISCORD_BOT_TOKEN"])

if __name__ == "__main__":
    asyncio.run(main())
jakobdylanc commented 8 months ago

Thanks for testing. Can you also try reproducing the error with the following 2 example codes and tell me the results? This will help narrow down more.

Update the base_url value if necessary for your setup. (Don't include the "/chat/completions" part)

  1. openai streamed responses example (no async)
    
    from openai import OpenAI

client = OpenAI(api_key="Not used", base_url="http://0.0.0.0:5000/v1")

stream = client.chat.completions.create( model="local-model", messages=[{"role": "user", "content": "Say this is a test"}], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="")

2. openai streamed responses example (async)
```python
from openai import AsyncOpenAI

client = AsyncOpenAI(api_key="Not used", base_url="http://0.0.0.0:5000/v1")

async def main():
    stream = await client.chat.completions.create(
        model="local-model",
        messages=[{"role": "user", "content": "Say this is a test"}],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")

asyncio.run(main())
Aeriolisse commented 8 months ago

There were no errors, they ran normally. test.py (no async) test2.py (async)

$ python3 test.py 
"Sure, I'd be happy to treat this as a test. Let me know how I can assist you with this test. I'm here to help!"
$ python3 test2.py 
"Sure, I'd be happy to treat this as a test. Is there a specific task or question you would like me to address?"

My oobabooga version is "snapshot-2024-01-28" I updated it multiple times, it was originally "snapshot-2023-12-17"

The first time I used discord-llm-chatbot was in 2024-01-18 so that's probably why I got those errors, using the latest commit solved it and works!