Closed SODAsoo07 closed 8 months ago
Good point - it should already support it. Give it a try (on my latest commit) and let me know if there's any issues.
Good point - it should already support it. Give it a try (on my latest commit) and let me know if there's any issues.
here is the error im receiving.
2024-01-30 15:28:30.235 INFO: HTTP Request: POST http://192.168.2.2:5000/v1/chat/completions "HTTP/1.1 200 OK"
2024-01-30 15:28:30.236 ERROR: Ignoring exception in on_message
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/httpcore/_exceptions.py", line 10, in map_exceptions
yield
File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 209, in _receive_event
event = self._h11_state.next_event()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/h11/_connection.py", line 469, in next_event
event = self._extract_next_receive_event()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/h11/_connection.py", line 419, in _extract_next_receive_event
event = self._reader.read_eof() # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/h11/_readers.py", line 204, in read_eof
raise RemoteProtocolError(
h11._util.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 67, in map_httpcore_exceptions
yield
File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 252, in __aiter__
async for part in self._httpcore_stream:
File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 361, in __aiter__
async for part in self._stream:
File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 337, in __aiter__
raise exc
File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 329, in __aiter__
async for chunk in self._connection._receive_response_body(**kwargs):
File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 198, in _receive_response_body
event = await self._receive_event(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/httpcore/_async/http11.py", line 208, in _receive_event
with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
File "/opt/conda/lib/python3.11/contextlib.py", line 155, in __exit__
self.gen.throw(typ, value, traceback)
File "/opt/conda/lib/python3.11/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/discord/client.py", line 441, in _run_event
await coro(*args, **kwargs)
File "/home/jovyan/work/Oobabot/discord-llm-chatbot/llmcord.py", line 165, in on_message
async for chunk in await llm_client.chat.completions.create(
File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 116, in __aiter__
async for item in self._iterator:
File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 129, in __stream__
async for sse in iterator:
File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 120, in _iter_events
async for sse in self._decoder.aiter(self.response.aiter_lines()):
File "/opt/conda/lib/python3.11/site-packages/openai/_streaming.py", line 231, in aiter
async for line in iterator:
File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 967, in aiter_lines
async for text in self.aiter_text():
File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 954, in aiter_text
async for byte_content in self.aiter_bytes():
File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 933, in aiter_bytes
async for raw_bytes in self.aiter_raw():
File "/opt/conda/lib/python3.11/site-packages/httpx/_models.py", line 991, in aiter_raw
async for raw_stream_bytes in self.stream:
File "/opt/conda/lib/python3.11/site-packages/httpx/_client.py", line 147, in __aiter__
async for chunk in self._stream:
File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 251, in __aiter__
with map_httpcore_exceptions():
File "/opt/conda/lib/python3.11/contextlib.py", line 155, in __exit__
self.gen.throw(typ, value, traceback)
File "/opt/conda/lib/python3.11/site-packages/httpx/_transports/default.py", line 84, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
What specific model are you using? Does the error happen every time?
What specific model are you using? Does the error happen every time?
im using a custom model I made called etheria-55b-v0.1 its loaded using exl2 on ooba as a back end and so far i cannot get a response from the bot. the same api is used to serve silly tavern when i have it loaded in a container. i can access it through a local ip
Try with a more standard model like llama2 and see what happens. It may be that your custom model is formatting its streamed responses improperly. I'd need access to your custom model so I can reproduce and try to debug on my end.
ohhh ill try that and here is the hugging face link https://huggingface.co/Steelskull/Etheria-55b-v0.1
the model is designed to use alpaca or chatml
thanks for the reply!
Try with a more standard model like llama2 and see what happens. It may be that your custom model is formatting its streamed responses improperly. I'd need access to your custom model so I can reproduce and try to debug on my end.
i tried with both a llama 2 and mistral model, with no luck so far
I tried with llama-2-7b-chat.Q4_K_M.gguf from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
It's working fine for me. I'm using oobagooba local API with URL set to http://localhost:5000/v1.
Just as a sanity check, make sure all your stuff is up to date (oobagooba, openai python package, etc.).
Besides that I can't do much until I reproduce the error on my end. The more info you provide the better so I can keep trying.
I tried with llama-2-7b-chat.Q4_K_M.gguf from here: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF
It's working fine for me. I'm using oobagooba local API with URL set to http://localhost:5000/v1.
Just as a sanity check, make sure all your stuff is up to date (oobagooba, openai python package, etc.).
Besides that I can't do much until I reproduce the error on my end. The more info you provide the better so I can keep trying.
i appreciate the help and yea im thinking it was an issue of some kind on my end, ive updated all associated packages. ive even installed the repo on the docker container that has ooba installed then ran the script with http://localhost:/5000/v1 and im getting the same issue. hmmm that means its gotta be a ooba issue... probably
No problem, keep me posted on your progress.
A (maybe helpful) side note: I actually did encounter your exact error ONCE while using mistral-medium from Mistral API (NOT a local model). I posted about it in Mistral's official Discord server:
Again, the error only happened once during a random streamed response and then it kept working fine after that. A Mistral dev replied but didn't know what to make of it:
Another idea - try this simple streamed response example from oobagooba wiki and see if the error still happens: https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#python-chat-example-with-streaming
Maybe it's something with the openai python package?
ok so after adding a crapton of http error logic and retry logic i finally got it to kick out an error on oobas side altho im not the best at api and server side things can you make anything of this.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 247, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
await func()
File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 191, in listen_for_disconnect
message = await receive()
File "/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 587, in receive
await self.message_event.wait()
File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 1537e0d7a8c0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/venv/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
await super().__call__(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/applications.py", line 116, in __call__
await self.middleware_stack(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
await self.app(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
raise exc
File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
await app(scope, receive, sender)
File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 746, in __call__
await route.handle(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 75, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 55, in wrapped_app
raise exc
File "/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 44, in wrapped_app
await app(scope, receive, sender)
File "/venv/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
await response(scope, receive, send)
File "/venv/lib/python3.10/site-packages/sse_starlette/sse.py", line 233, in __call__
async with anyio.create_task_group() as task_group:
File "/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
raise BaseExceptionGroup(
exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
Not sure what to make of that. Did you try what I suggested above? AKA try running the simple streamed response example that doesn't use the openai python package.
Another idea - try this simple streamed response example from oobagooba wiki and see if the error still happens: https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API#python-chat-example-with-streaming
Maybe it's something with the openai python package?
Same style of error: im going to try to eliminate ooba and try tabbyapi
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:761, in HTTPResponse._update_chunk_length(self)
760 try:
--> 761 self.chunk_left = int(line, 16)
762 except ValueError:
763 # Invalid chunked protocol response, abort.
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
InvalidChunkLength Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:444, in HTTPResponse._error_catcher(self)
443 try:
--> 444 yield
446 except SocketTimeout:
447 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but
448 # there is yet no clean way to get at it from this context.
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:828, in HTTPResponse.read_chunked(self, amt, decode_content)
827 while True:
--> 828 self._update_chunk_length()
829 if self.chunk_left == 0:
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:765, in HTTPResponse._update_chunk_length(self)
764 self.close()
--> 765 raise InvalidChunkLength(self, line)
InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)
During handling of the above exception, another exception occurred:
ProtocolError Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/requests/models.py:816, in Response.iter_content.<locals>.generate()
815 try:
--> 816 yield from self.raw.stream(chunk_size, decode_content=True)
817 except ProtocolError as e:
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:624, in HTTPResponse.stream(self, amt, decode_content)
623 if self.chunked and self.supports_chunked_reads():
--> 624 for line in self.read_chunked(amt, decode_content=decode_content):
625 yield line
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:816, in HTTPResponse.read_chunked(self, amt, decode_content)
811 raise BodyNotHttplibCompatible(
812 "Body should be http.client.HTTPResponse like. "
813 "It should have have an fp attribute which returns raw chunks."
814 )
--> 816 with self._error_catcher():
817 # Don't bother reading the body of a HEAD request.
818 if self._original_response and is_response_to_head(self._original_response):
File /opt/conda/lib/python3.11/contextlib.py:155, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
154 try:
--> 155 self.gen.throw(typ, value, traceback)
156 except StopIteration as exc:
157 # Suppress StopIteration *unless* it's the same exception that
158 # was passed to throw(). This prevents a StopIteration
159 # raised inside the "with" statement from being suppressed.
File /opt/conda/lib/python3.11/site-packages/urllib3/response.py:461, in HTTPResponse._error_catcher(self)
459 except (HTTPException, SocketError) as e:
460 # This includes IncompleteRead.
--> 461 raise ProtocolError("Connection broken: %r" % e, e)
463 # If no exception is thrown, we should avoid cleaning up
464 # unnecessarily.
ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
During handling of the above exception, another exception occurred:
ChunkedEncodingError Traceback (most recent call last)
Cell In[47], line 26
23 client = sseclient.SSEClient(stream_response)
25 assistant_message = ''
---> 26 for event in client.events():
27 payload = json.loads(event.data)
28 chunk = payload['choices'][0]['message']['content']
File /opt/conda/lib/python3.11/site-packages/sseclient/__init__.py:55, in SSEClient.events(self)
54 def events(self):
---> 55 for chunk in self._read():
56 event = Event()
57 # Split before decoding so splitlines() only uses \r and \n
File /opt/conda/lib/python3.11/site-packages/sseclient/__init__.py:45, in SSEClient._read(self)
38 """Read the incoming event source stream and yield event chunks.
39
40 Unfortunately it is possible for some servers to decide to break an
41 event into multiple HTTP chunks in the response. It is thus necessary
42 to correctly stitch together consecutive response chunks and find the
43 SSE delimiter (empty new line) to yield full, correct event chunks."""
44 data = b''
---> 45 for chunk in self._event_source:
46 for line in chunk.splitlines(True):
47 data += line
File /opt/conda/lib/python3.11/site-packages/requests/models.py:818, in Response.iter_content.<locals>.generate()
816 yield from self.raw.stream(chunk_size, decode_content=True)
817 except ProtocolError as e:
--> 818 raise ChunkedEncodingError(e)
819 except DecodeError as e:
820 raise ContentDecodingError(e)
ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Still confused why I'm not seeing this error with oobagooba though. What's different about our setups that's causing this?
Still confused why I'm not seeing this error with oobagooba though. What's different about our setups that's causing this?
I have no idea but when I used tabbyapi I was receiving responses. So it's gotta be either docker (on my end) or ooba. I'll dig in more when I can.
Biggest problem is I was unable to adjust the samplers due to needing to add **kwargs to the api call but open ai api doesn't support the full range or possible samplers.
I attempted to rewrite the api but I got wayyy out of my depth and couldn't figure out how the api calls were structured. (On tabbyapi end)
I was getting the same error as below.
h11._util.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
I fixed it by using oobabooga's example, it's currently working as of now here's the code below.
Note that this code may be outdated.
import asyncio
from datetime import datetime
import logging
import os
import discord
from dotenv import load_dotenv
#from openai import AsyncOpenAI
import requests
import sseclient # pip install sseclient-py
import json
load_dotenv()
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s.%(msecs)03d %(levelname)s: %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
#this doesn't work, local only
LLM_CONFIG = {
"gpt": {
"api_key": os.environ["OPENAI_API_KEY"],
"base_url": "https://api.openai.com/v1",
},
"mistral": {
"api_key": os.environ["MISTRAL_API_KEY"],
"base_url": "https://api.mistral.ai/v1",
},
"local": {
"api_key": "Not used",
"base_url": os.environ["LOCAL_SERVER_URL"],
},
}
LLM_VISION_SUPPORT = "vision" in os.environ["LLM"]
MAX_COMPLETION_TOKENS = 1024
ALLOWED_CHANNEL_IDS = [int(i) for i in os.environ["ALLOWED_CHANNEL_IDS"].split(",") if i]
ALLOWED_ROLE_IDS = [int(i) for i in os.environ["ALLOWED_ROLE_IDS"].split(",") if i]
MAX_IMAGES = int(os.environ["MAX_IMAGES"]) if LLM_VISION_SUPPORT else 0
MAX_IMAGE_WARNING = f"⚠️ Max {MAX_IMAGES} image{'' if MAX_IMAGES == 1 else 's'} per message" if MAX_IMAGES > 0 else "⚠️ Can't see images"
MAX_MESSAGES = int(os.environ["MAX_MESSAGES"])
MAX_MESSAGE_WARNING = f"⚠️ Only using last {MAX_MESSAGES} messages"
EMBED_COLOR = {"incomplete": discord.Color.orange(), "complete": discord.Color.green()}
EMBED_MAX_LENGTH = 4096
EDITS_PER_SECOND = 1.3
#llm_client = AsyncOpenAI(**LLM_CONFIG[os.environ["LLM"].split("-", 1)[0]])
intents = discord.Intents.default()
intents.message_content = True
discord_client = discord.Client(intents=intents)
message_nodes = {}
in_progress_message_ids = []
class MessageNode:
def __init__(self, message, too_many_images=False, replied_to=None):
self.message = message
self.too_many_images = too_many_images
self.replied_to = replied_to
def get_system_prompt():
return {
"role": "system",
"content": f"{os.environ['CUSTOM_SYSTEM_PROMPT']}\nUser's names are their Discord IDs and should be typed as '<@ID>'.\nToday's date: {datetime.now().strftime('%B %d %Y')}",
}
url = "http://0.0.0.0:5000/v1/chat/completions"
headers = {
"Content-Type": "application/json"
}
@discord_client.event
async def on_message(message):
# Filter out unwanted messages
if (
(message.channel.type != discord.ChannelType.private and discord_client.user not in message.mentions)
or (ALLOWED_CHANNEL_IDS and message.channel.id not in ALLOWED_CHANNEL_IDS)
or (ALLOWED_ROLE_IDS and (message.channel.type == discord.ChannelType.private or not [role for role in message.author.roles if role.id in ALLOWED_ROLE_IDS]))
or message.author.bot
):
return
# If user replied to a message that's still generating, wait until it's done
while message.reference and message.reference.message_id in in_progress_message_ids:
await asyncio.sleep(0)
async with message.channel.typing():
# Loop through message reply chain and create MessageNodes
current_message = message
previous_message_id = None
while True:
try:
current_message_text = current_message.embeds[0].description if current_message.author == discord_client.user else current_message.content
if current_message_text.startswith(discord_client.user.mention):
current_message_text = current_message_text[len(discord_client.user.mention) :].lstrip()
current_message_content = current_message_text if current_message_text else ''
current_message_images = [
{
"type": "image_url",
"image_url": {"url": att.url, "detail": "low"},
}
for att in current_message.attachments
if "image" in att.content_type
]
#current_message_content += current_message_images[:MAX_IMAGES]
if "mistral" in os.environ["LLM"]:
# Temporary fix until Mistral API supports message.content as a list
current_message_content = current_message_text
current_message_role = "assistant" if current_message.author == discord_client.user else "user"
message_nodes[current_message.id] = MessageNode(
{
"role": current_message_role,
"content": current_message_content,
"name": str(current_message.author.id)
}
)
if len(current_message_images) > MAX_IMAGES:
message_nodes[current_message.id].too_many_images = True
if previous_message_id:
message_nodes[previous_message_id].replied_to = message_nodes[current_message.id]
if not current_message.reference:
break
if current_message.reference.message_id in message_nodes:
message_nodes[current_message.id].replied_to = message_nodes[current_message.reference.message_id]
break
previous_message_id = current_message.id
current_message = (
current_message.reference.resolved
if isinstance(current_message.reference.resolved, discord.Message)
else await message.channel.fetch_message(current_message.reference.message_id)
)
except (discord.NotFound, discord.HTTPException, IndexError):
break
# Build conversation history from reply chain and set user warnings
reply_chain = []
user_warnings = set()
current_node = message_nodes[message.id]
while current_node is not None and len(reply_chain) < MAX_MESSAGES:
reply_chain += [current_node.message]
if current_node.too_many_images:
user_warnings.add(MAX_IMAGE_WARNING)
if len(reply_chain) == MAX_MESSAGES and current_node.replied_to:
user_warnings.add(MAX_MESSAGE_WARNING)
current_node = current_node.replied_to
#print("REPLY CHAIN")
#print(reply_chain[::-1])
#print(reply_chain[0])
#print("REPLY CHAIN")
messages = []
messages.append(get_system_prompt())
for msgs in reply_chain:
messages.append(msgs)
#print(messages)
# Generate and send bot reply
logging.info(f"Message received: {reply_chain[0]}, reply chain length: {len(reply_chain)}")
response_messages = []
response_message_contents = []
previous_content = None
edit_message_task = None
#print(os.environ["LLM"])
#copied from sillytavern request, mixtral settings
data = {
"mode": "instruct",
"messages": messages,
"stream": True,
"max_tokens": MAX_COMPLETION_TOKENS,
"max_new_tokens": 2048,
"temperature": 0.99,
"top_p": 1,
"typical_p": 1,
"min_p": 0.02,
"repetition_penalty": 1,
"frequency_penalty": 0,
"presence_penalty": 0,
"top_k": 0,
"min_length": 0,
"min_tokens": 0,
"num_beams": 1,
"length_penalty": 1,
"early_stopping": False,
"add_bos_token": True,
"truncation_length": 4096,
"ban_eos_token": False,
"skip_special_tokens": True,
"top_a": 0,
"tfs": 1,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"mirostat_mode": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"repetition_penalty_range": 600,
"encoder_repetition_penalty": 1,
"no_repeat_ngram_size": 0,
"penalty_alpha": 0,
"temperature_last": True,
"seed": -1,
"guidance_scale": 1
}
stream_response = requests.post(url, headers=headers, json=data, verify=False, stream=True)
client = sseclient.SSEClient(stream_response)
for chunk in client.events():
payload = json.loads(chunk.data)
current_content = payload['choices'][0]['message']['content'] or ""
#print(current_content)
if previous_content:
if not response_messages or len(response_message_contents[-1] + previous_content) > EMBED_MAX_LENGTH:
reply_message = message if not response_messages else response_messages[-1]
embed = discord.Embed(description="⏳", color=EMBED_COLOR["incomplete"])
for warning in sorted(user_warnings):
embed.add_field(name=warning, value="", inline=False)
response_messages += [
await reply_message.reply(
embed=embed,
silent=True,
)
]
in_progress_message_ids.append(response_messages[-1].id)
last_message_task_time = datetime.now().timestamp()
response_message_contents += [""]
response_message_contents[-1] += previous_content
final_message_edit = len(response_message_contents[-1] + current_content) > EMBED_MAX_LENGTH or current_content == ""
if (
final_message_edit
or (not edit_message_task or edit_message_task.done())
and datetime.now().timestamp() - last_message_task_time >= len(in_progress_message_ids) / EDITS_PER_SECOND
):
while edit_message_task and not edit_message_task.done():
await asyncio.sleep(0)
if response_message_contents[-1].strip():
embed.description = response_message_contents[-1]
embed.color = EMBED_COLOR["complete"] if final_message_edit else EMBED_COLOR["incomplete"]
edit_message_task = asyncio.create_task(response_messages[-1].edit(embed=embed))
last_message_task_time = datetime.now().timestamp()
previous_content = current_content
# Create MessageNode(s) for bot reply message(s) (can be multiple if bot reply was long)
for response_message in response_messages:
message_nodes[response_message.id] = MessageNode(
{
"role": "assistant",
"content": "".join(response_message_contents),
"name": str(discord_client.user.id),
},
replied_to=message_nodes[message.id],
)
in_progress_message_ids.remove(response_message.id)
async def main():
await discord_client.start(os.environ["DISCORD_BOT_TOKEN"])
if __name__ == "__main__":
asyncio.run(main())
Thanks for testing. Can you also try reproducing the error with the following 2 example codes and tell me the results? This will help narrow down more.
Update the base_url value if necessary for your setup. (Don't include the "/chat/completions" part)
from openai import OpenAI
client = OpenAI(api_key="Not used", base_url="http://0.0.0.0:5000/v1")
stream = client.chat.completions.create( model="local-model", messages=[{"role": "user", "content": "Say this is a test"}], stream=True, ) for chunk in stream: print(chunk.choices[0].delta.content or "", end="")
2. openai streamed responses example (async)
```python
from openai import AsyncOpenAI
client = AsyncOpenAI(api_key="Not used", base_url="http://0.0.0.0:5000/v1")
async def main():
stream = await client.chat.completions.create(
model="local-model",
messages=[{"role": "user", "content": "Say this is a test"}],
stream=True,
)
async for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
asyncio.run(main())
There were no errors, they ran normally. test.py (no async) test2.py (async)
$ python3 test.py
"Sure, I'd be happy to treat this as a test. Let me know how I can assist you with this test. I'm here to help!"
$ python3 test2.py
"Sure, I'd be happy to treat this as a test. Is there a specific task or question you would like me to address?"
My oobabooga version is "snapshot-2024-01-28" I updated it multiple times, it was originally "snapshot-2023-12-17"
The first time I used discord-llm-chatbot was in 2024-01-18 so that's probably why I got those errors, using the latest commit solved it and works!
Hello! Do you have any plans to support Oobaboga, other than LM Studio when you use the local model?