theroyallab / tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast
GNU Affero General Public License v3.0
421 stars 59 forks source link

[BUG] Server crash after submiting a request to tabbyAPI using VS Code extension "Continue" #151

Closed mindkrypted closed 3 weeks ago

mindkrypted commented 1 month ago

Describe the bug I was testing a model with the team over the Exllama Discord server and encountered a hard server crash while submiting a regular request. The server worked well for about 1 hour previous to this issue, I didn't change settings before this occurence.

The server host reported that there was some information leakage that could be an issue if tabbyAPI would be setup in a production environment. (For more details, please come by on Exllama Discord server, we have a thread under General called "Exl2 crashing thread" with some relevant info.)

To Reproduce Steps to reproduce the behavior: No idea how to exactly trigger the issue, after a restart, the server worked well when using the same sort of queries.

Expected behavior Server not crashing?

Logs

VS code Web Dev debug logs from session using the extension "Continue" ```log.ts:419 INFO Started local extension host with pid 10160. log.ts:429 WARN [twxs.cmake]: Cannot register 'cmake.cmakePath'. This property is already registered. log.ts:419 INFO [perf] Render performance baseline is 24ms log.ts:429 WARN [cmake-tools]: Couldn't find message for key cmake-tools.configuration.cmake.deleteBuildDirOnCleanCconfigure.description. webviewElement.ts:482 An iframe which has both allow-scripts and allow-same-origin for its sandbox attribute can escape its sandboxing. mountTo @ webviewElement.ts:482 console.ts:137 [Extension Host] Error reading file EntryNotFound (FileSystemError): Error: ENOENT: no such file or directory, stat 'C:\tasks' at P.e (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:6131) at Object.stat (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:3902) at async _VsCodeIdeUtils.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444128:29) at async VsCodeIde.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444691:16) at async _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:34) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 console.ts:137 [Extension Host] Error reading file EntryNotFound (FileSystemError): Error: ENOENT: no such file or directory, stat 'C:\tasks' at P.e (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:6131) at Object.stat (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:3902) at async _VsCodeIdeUtils.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444128:29) at async VsCodeIde.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444691:16) at async _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:34) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: Parsing failed y @ console.ts:137 console.ts:137 [Extension Host] stack trace: Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 mainThreadExtensionService.ts:78 [Continue.continue]Parsing failed $onExtensionRuntimeError @ mainThreadExtensionService.ts:78 mainThreadExtensionService.ts:79 Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) $onExtensionRuntimeError @ mainThreadExtensionService.ts:79 console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: Parsing failed y @ console.ts:137 console.ts:137 [Extension Host] stack trace: Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 mainThreadExtensionService.ts:78 [Continue.continue]Parsing failed $onExtensionRuntimeError @ mainThreadExtensionService.ts:78 mainThreadExtensionService.ts:79 Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) $onExtensionRuntimeError @ mainThreadExtensionService.ts:79 console.ts:137 [Extension Host] Error reading file EntryNotFound (FileSystemError): Error: ENOENT: no such file or directory, stat 'C:\tasks' at P.e (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:6131) at Object.stat (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:3902) at async _VsCodeIdeUtils.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444128:29) at async VsCodeIde.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444691:16) at async _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:34) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 console.ts:137 [Extension Host] Error reading file EntryNotFound (FileSystemError): Error: ENOENT: no such file or directory, stat 'C:\tasks' at P.e (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:6131) at Object.stat (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:3902) at async _VsCodeIdeUtils.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444128:29) at async VsCodeIde.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444691:16) at async _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:34) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: Parsing failed y @ console.ts:137 console.ts:137 [Extension Host] stack trace: Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 mainThreadExtensionService.ts:78 [Continue.continue]Parsing failed $onExtensionRuntimeError @ mainThreadExtensionService.ts:78 mainThreadExtensionService.ts:79 Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) $onExtensionRuntimeError @ mainThreadExtensionService.ts:79 console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: Parsing failed y @ console.ts:137 console.ts:137 [Extension Host] stack trace: Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 mainThreadExtensionService.ts:78 [Continue.continue]Parsing failed $onExtensionRuntimeError @ mainThreadExtensionService.ts:78 mainThreadExtensionService.ts:79 Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) $onExtensionRuntimeError @ mainThreadExtensionService.ts:79 console.ts:137 [Extension Host] Error reading file EntryNotFound (FileSystemError): Error: ENOENT: no such file or directory, stat 'C:\tasks' at P.e (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:6131) at Object.stat (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:3902) at async _VsCodeIdeUtils.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444128:29) at async VsCodeIde.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444691:16) at async _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:34) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 console.ts:137 [Extension Host] Error reading file EntryNotFound (FileSystemError): Error: ENOENT: no such file or directory, stat 'C:\tasks' at P.e (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:6131) at Object.stat (c:\Microsoft VS Code\resources\app\out\vs\workbench\api\node\extensionHostProcess.js:153:3902) at async _VsCodeIdeUtils.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444128:29) at async VsCodeIde.readFile (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:444691:16) at async _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:34) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: Parsing failed y @ console.ts:137 console.ts:137 [Extension Host] stack trace: Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 mainThreadExtensionService.ts:78 [Continue.continue]Parsing failed $onExtensionRuntimeError @ mainThreadExtensionService.ts:78 mainThreadExtensionService.ts:79 Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) $onExtensionRuntimeError @ mainThreadExtensionService.ts:79 console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: Parsing failed y @ console.ts:137 console.ts:137 [Extension Host] stack trace: Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) y @ console.ts:137 mainThreadExtensionService.ts:78 [Continue.continue]Parsing failed $onExtensionRuntimeError @ mainThreadExtensionService.ts:78 mainThreadExtensionService.ts:79 Error: Parsing failed at _Parser.parse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:89096:81) at _ImportDefinitionsService._getFileInfo (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:90002:28) at async PrecalculatedLruCache.initKey (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:87952:25) $onExtensionRuntimeError @ mainThreadExtensionService.ts:79 console.ts:137 [Extension Host] Error: Malformed JSON sent from server: {"error":{"message":"Chat completion aborted. Please check the server console.","trace":"Traceback (most recent call last):\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 275, in stream_generate_chat_completion\n raise generation\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 229, in _stream_collector\n async for generation in new_generation:\n File \"/media/npetro/nvme_1/tabbyAPI/backends/exllamav2/model.py\", line 1147, in generate_gen\n async for result in job:\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 92, in __aiter__\n raise result\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 28, in _run_iteration\n results = self.generator.iterate()\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py\", line 115, in decorate_context\n return func(*args, **kwargs)\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 937, in iterate\n self.iterate_gen(results)\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 1159, in iterate_gen\n eos, sampled_token = job.receive_sample(\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 1921, in receive_sample\n new_text = id_to_piece[next_token.item()]\nIndexError: list index out of range\n"}} at parseDataLine (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:102781:11) at parseSseLine (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:102789:33) at streamSse (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:102804:37) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async OpenAI2._streamChat (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:138099:26) at async OpenAI2.streamChat (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:102652:30) at async llmStreamChat (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:430628:21) at async n.value (c:\Users\krash\.vscode\extensions\continue.continue-0.9.182-win32-x64\out\extension.js:436991:27) log.ts:439 ERR [Extension Host] Error handling webview message: { "msg": { "messageId": "c6580aa7-f3a2-4db1-8432-678c19a915eb", "messageType": "llm/streamChat", "data": { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "I have a gradio web app that generate an audio file and allows to download it.\nThe audio file is generated when there is a button click, it calls a function, the function then launch an inference job and return a numpy audio array.\nThe audio file is going to be automatically generated and save to the temp, basically, I only need to access temp and copy the file.\nHow can I copy the resulting audio file from Gradio's temp directory to a different folder?" } ] } ], "title": "OpenAI-turbcat72b-ep3-5bpw-AUTODETECT", "completionOptions": {} } } } Error: Malformed JSON sent from server: {"error":{"message":"Chat completion aborted. Please check the server console.","trace":"Traceback (most recent call last):\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 275, in stream_generate_chat_completion\n raise generation\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 229, in _stream_collector\n async for generation in new_generation:\n File \"/media/npetro/nvme_1/tabbyAPI/backends/exllamav2/model.py\", line 1147, in generate_gen\n async for result in job:\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 92, in __aiter__\n raise result\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 28, in _run_iteration\n results = self.generator.iterate()\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py\", line 115, in decorate_context\n return func(*args, **kwargs)\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 937, in iterate\n self.iterate_gen(results)\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 1159, in iterate_gen\n eos, sampled_token = job.receive_sample(\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 1921, in receive_sample\n new_text = id_to_piece[next_token.item()]\nIndexError: list index out of range\n"}} console.ts:137 [Extension Host] Error handling webview message: { "msg": { "messageId": "c6580aa7-f3a2-4db1-8432-678c19a915eb", "messageType": "llm/streamChat", "data": { "messages": [ { "role": "user", "content": [ { "type": "text", "text": "I have a gradio web app that generate an audio file and allows to download it.\nThe audio file is generated when there is a button click, it calls a function, the function then launch an inference job and return a numpy audio array.\nThe audio file is going to be automatically generated and save to the temp, basically, I only need to access temp and copy the file.\nHow can I copy the resulting audio file from Gradio's temp directory to a different folder?" } ] } ], "title": "OpenAI-turbcat72b-ep3-5bpw-AUTODETECT", "completionOptions": {} } } } Error: Malformed JSON sent from server: {"error":{"message":"Chat completion aborted. Please check the server console.","trace":"Traceback (most recent call last):\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 275, in stream_generate_chat_completion\n raise generation\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 229, in _stream_collector\n async for generation in new_generation:\n File \"/media/npetro/nvme_1/tabbyAPI/backends/exllamav2/model.py\", line 1147, in generate_gen\n async for result in job:\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 92, in __aiter__\n raise result\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 28, in _run_iteration\n results = self.generator.iterate()\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py\", line 115, in decorate_context\n return func(*args, **kwargs)\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 937, in iterate\n self.iterate_gen(results)\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 1159, in iterate_gen\n eos, sampled_token = job.receive_sample(\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py\", line 1921, in receive_sample\n new_text = id_to_piece[next_token.item()]\nIndexError: list index out of range\n"}} y @ console.ts:137 notificationsAlerts.ts:42 Malformed JSON sent from server: {"error":{"message":"Chat completion aborted. Please check the server console.","trace":"Traceback (most recent call last):\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 275, in stream_generate_chat_completion\n raise generation\n File \"/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py\", line 229, in _stream_collector\n async for generation in new_generation:\n File \"/media/npetro/nvme_1/tabbyAPI/backends/exllamav2/model.py\", line 1147, in generate_gen\n async for result in job:\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 92, in __aiter__\n raise result\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py\", line 28, in _run_iteration\n results = self.generator.iterate()\n File \"/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages... c @ notificationsAlerts.ts:42```
Server side order of events ``` INFO: Metrics: 664 tokens generated in 41.64 seconds (Queue: 0.0 s, Process: 21 cached tokens and 24 new tokens at 65.84 T/s, Generate: 16.09 T/s, Context: 45 tokens) INFO: 24.xxx.xxx.xxx:0 - "POST /v1/chat/completions HTTP/1.1" 200 ERROR: Completion generation cancelled by user. ERROR: Chat completion generation cancelled by user. INFO: 24.xxx.xxx.xxx:0 - "POST /v1/chat/completions HTTP/1.1" 200 ERROR: Traceback (most recent call last): ERROR: File "/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 275, in stream_generate_chat_completion ERROR: raise generation ERROR: File "/media/npetro/nvme_1/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 229, in _stream_collector ERROR: async for generation in new_generation: ERROR: File "/media/npetro/nvme_1/tabbyAPI/backends/exllamav2/model.py", line 1147, in generate_gen ERROR: async for result in job: ERROR: File "/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py", line 92, in __aiter__ ERROR: raise result ERROR: File "/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic_async.py", line 28, in _run_iteration ERROR: results = self.generator.iterate() ERROR: File "/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context ERROR: return func(*args, **kwargs) ERROR: File "/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py", line 937, in iterate ERROR: self.iterate_gen(results) ERROR: File "/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py", line 1159, in iterate_gen ERROR: eos, sampled_token = job.receive_sample( ERROR: File "/media/npetro/nvme_1/tabbyAPI/venv/lib/python3.10/site-packages/exllamav2/generator/dynamic.py", line 1921, in receive_sample ERROR: new_text = id_to_piece[next_token.item()] ERROR: IndexError: list index out of range ERROR: Sent to request: Chat completion aborted. Please check the server console. INFO: 24.xxx.xxx.xx:0 - "POST /v1/chat/completions HTTP/1.1" 200 ERROR: Chat completion generation cancelled by user. ```

System info Kubuntu 23.10 x86_64 AMD EPYC 7B13 (128) @ 2.250GHz 3x3090s Latest head tabbyAPI as of 13JUL2024 exllamav2 0.1.7+cu121.torch2.3.1

official-elinas commented 1 month ago

I am running this server and just want to make @bdashore3 aware. Thanks.

bdashore3 commented 1 month ago

The traceback over API problem is solved in 6019c936379ffaa702d6da0454f7ae8eb4e20cae

DocShotgun commented 1 month ago

In terms of the server crashing here - what exact model is being used? It looks like it might be one of those tokenizer/vocab size problems.

mindkrypted commented 1 month ago

In terms of the server crashing here - what exact model is being used? It looks like it might be one of those tokenizer/vocab size problems.

It's a qwen2 72b based fine-tune that's in development.

@official-elinas anything to add about the tokenizer/vocab being used for the model?

DocShotgun commented 1 month ago

It looks like it’s trying to convert a token ID to the actual token, but said index is out of range and doesn’t exist. Unlikely to be an actual bug in tabby/exl2. If additional tokens were added to the model’s vocab during training, I’d start troubleshooting there.

official-elinas commented 1 month ago

@official-elinas anything to add about the tokenizer/vocab being used for the model?

No nothing to add except that these types of crashes seem to happen when @mindkrypted does stuff.

It looks like it’s trying to convert a token ID to the actual token, but said index is out of range and doesn’t exist. Unlikely to be an actual bug in tabby/exl2. If additional tokens were added to the model’s vocab during training, I’d start troubleshooting there.

No additional tokens were added to the model. It has the same vocab size as Qwen2 72b Instruct which it was trained on.

DocShotgun commented 1 month ago

Ok in that case, try building the latest exl2 dev branch from source. There was a rounding issue that could occasionally cause some models to sample out of range tokens. This was notably an issue with command-r and command-r plus when used with no truncation sampler (top-p, min-p, top-k, etc.).

I’m not sure if the same problem can happen with Qwen2, but it’s worth a shot.

official-elinas commented 1 month ago

Here's another crash from this morning, also originating from @mindkrypted

image

Seems to be a bad request but it crashed the server.

official-elinas commented 1 month ago

Ok in that case, try building the latest exl2 dev branch from source. There was a rounding issue that could occasionally cause some models to sample out of range tokens. This was notably an issue with command-r and command-r plus when used with no truncation sampler (top-p, min-p, top-k, etc.). I’m not sure if the same problem can happen with Qwen2, but it’s worth a shot.

I've forced top p to 0.99 in the meantime as a workaround from turboderp, as I'm not able to build exl2 due to dependency issues.

image

DocShotgun commented 1 month ago

If this error only happens when a generation request is made with no truncation sampler, it would support the above hypothesis. I think Qwen2 would also have to have a tokenizer smaller than the model’s output layer for this to happen too. Can’t check that without being at my PC.

official-elinas commented 1 month ago

This was an explanation from turboderp

image

official-elinas commented 1 month ago

The traceback over API problem is solved in https://github.com/theroyallab/tabbyAPI/commit/6019c936379ffaa702d6da0454f7ae8eb4e20cae

Thanks for the quick resolution.

bdashore3 commented 1 month ago

Yeah, I'd test this using a build of exllamav2 with this commit: https://github.com/turboderp/exllamav2/commit/b8a5bcf1947b7ea77b4b662d0b5f0bbef879cbfd

bdashore3 commented 1 month ago

I've also added this commit: https://github.com/theroyallab/tabbyAPI/commit/9dae46114288dc1feb6a3f70fce5e5d1c117eeab

which gives some redundancy in the event of a fatal generator error. This may not always work, but should alleviate future problems. Users should report tracebacks, but redundancy is always good in a long-running project.

bdashore3 commented 3 weeks ago

Closing this issue because the issue is fixed and checks are in place.