substratusai / lingo

Lightweight ML model proxy and autoscaler for kubernetes
https://www.substratus.ai
Apache License 2.0
102 stars 6 forks source link

Queue placement is not cancelled on request cancellation #18

Closed nstogner closed 8 months ago

nstogner commented 8 months ago

A large number of cancelled HTTP requests can cause the queue to block forever.

nstogner commented 8 months ago

After hitting lingo with a lot of concurrent requests, and cancelling that job, lingo reports the lingering requests forever:

kubectl logs -f lingo-5f7f7978b-gvb77 | grep ceil
2023/11/11 16:31:32 Average for deployment: stapi-minilm-l6-v2: 680 (ceil: 7), current wait count: 680
2023/11/11 16:31:35 Average for deployment: kubernetes: 0 (ceil: 0), current wait count: 0
2023/11/11 16:31:35 Average for deployment: lingo: 0 (ceil: 0), current wait count: 0
2023/11/11 16:31:35 Average for deployment: stapi-minilm-l6-v2: 680 (ceil: 7), current wait count: 680
2023/11/11 16:31:38 Average for deployment: kubernetes: 0 (ceil: 0), current wait count: 0
2023/11/11 16:31:38 Average for deployment: lingo: 0 (ceil: 0), current wait count: 0
2023/11/11 16:31:38 Average for deployment: stapi-minilm-l6-v2: 680 (ceil: 7), current wait count: 680