When sending POST request to the LLM, especially when using mlx-community/Qwen2.5-32B-Instruct-8bit, the server crashes with the following error:
127.0.0.1 - - [23/Oct/2024 03:15:06] "POST /v1/chat/completions HTTP/1.1" 200 -
2024-10-23 03:15:06,650 - DEBUG - Starting completion:
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 61900)
Traceback (most recent call last):
File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 318, in _handle_request_noblock
self.process_request(request, client_address)
File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 349, in process_request
self.finish_request(request, client_address)
File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 362, in finish_request
self.RequestHandlerClass(request, client_address, self)
File "/Users/si/development/mlx/.venv/lib/python3.12/site-packages/mlx_lm/server.py", line 733, in <lambda>
lambda *args, **kwargs: handler_class(
^^^^^^^^^^^^^^
File "/Users/si/development/mlx/.venv/lib/python3.12/site-packages/mlx_lm/server.py", line 200, in __init__
super().__init__(*args, **kwargs)
File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/socketserver.py", line 761, in __init__
self.handle()
File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/server.py", line 436, in handle
self.handle_one_request()
File "/opt/homebrew/Cellar/python@3.12/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/http/server.py", line 424, in handle_one_request
method()
File "/Users/si/development/mlx/.venv/lib/python3.12/site-packages/mlx_lm/server.py", line 296, in do_POST
method(prompt, stop_id_sequences)
File "/Users/si/development/mlx/.venv/lib/python3.12/site-packages/mlx_lm/server.py", line 504, in handle_completion
detokenizer.finalize()
File "/Users/si/development/mlx/.venv/lib/python3.12/site-packages/mlx_lm/tokenizer_utils.py", line 219, in finalize
self.text += self._maybe_trim_space(current_text)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/si/development/mlx/.venv/lib/python3.12/site-packages/mlx_lm/tokenizer_utils.py", line 196, in _maybe_trim_space
if current_text[0] != " ":
~~~~~~~~~~~~^^^
IndexError: string index out of range
----------------------------------------
Hi,
When sending POST request to the LLM, especially when using
mlx-community/Qwen2.5-32B-Instruct-8bit
, the server crashes with the following error: