I've been examining the api_generate_stream function in the fastchat/serve/vllm_worker.py file and I've noticed a potential issue related to the semaphore release.
In the current implementation, if an exception occurs during the execution of worker.generate_stream(params)
I've been examining the
api_generate_stream
function in thefastchat/serve/vllm_worker.py
file and I've noticed a potential issue related to the semaphore release.In the current implementation, if an exception occurs during the execution of
worker.generate_stream(params)
https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/vllm_worker.py#L205
the
create_background_tasks(request_id)
function might not be called. This could lead to the semaphore not being properly released.