gpustack / gpustack

Manage GPU clusters for running LLMs
https://gpustack.ai
Apache License 2.0
673 stars 53 forks source link

Multi-model simultaneous inference, GPUStack’s reception of messages gets interrupted #376

Open linyinli opened 1 month ago

linyinli commented 1 month ago

Describe the bug Messages breaks and the inference doesn't complete: 中断

Steps to reproduce

  1. Create a GPUStack on macOS and add a Ubuntu worker with an RTX 4090 GPU;
  2. Deploying below models on the Ubuntu worker; image
Meta-Llama-3.1-8B-Instruct
Llama-3.2-3B-Instruct
Llama-3.2-1B-Instruct
qwen2.5-7b-instruct
qwen2.5-3b-instruct
qwen2.5-0.5b-instruct
  1. Simultaneously send inference requests using the compare view, try multiple times.

Result Messages breaks and the inference doesn't complete: 中断

Model instance logs are normal: log

GPUStack server log:

2024-10-15T11:12:03+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Unterminated string starting at: line 1 column 196 (char 195)
2024-10-16T10:16:48+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Expecting value: line 1 column 169 (char 168)
2024-10-16T10:16:48+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Unterminated string starting at: line 1 column 118 (char 117)
2024-10-16T10:17:03+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Expecting ':' delimiter: line 1 column 197 (char 196)
2024-10-16T10:17:52+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Unterminated string starting at: line 1 column 170 (char 169)
2024-10-16T10:19:48+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Unterminated string starting at: line 1 column 165 (char 164)
2024-10-16T10:19:48+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Expecting ',' delimiter: line 1 column 156 (char 155)
2024-10-16T10:19:50+08:00 - gpustack.api.middlewares - ERROR - Error processing streaming response: Unterminated string starting at: line 1 column 166 (char 165)

Environment

gitlawr commented 1 month ago

Can not reproduce