Semaphore release Issue in api_generate_stream Function of vllm_worker

I've been examining the api_generate_stream function in the fastchat/serve/vllm_worker.py file and I've noticed a potential issue related to the semaphore release.

In the current implementation, if an exception occurs during the execution of worker.generate_stream(params)

https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/vllm_worker.py#L205

the create_background_tasks(request_id) function might not be called. This could lead to the semaphore not being properly released.

lm-sys / FastChat

Semaphore release Issue in api_generate_stream Function of vllm_worker #3389