substratusai / lingo

Lightweight ML model proxy and autoscaler for kubernetes
https://www.substratus.ai
Apache License 2.0
102 stars 6 forks source link

Lingo should retry on proxy failure #48

Closed nstogner closed 5 months ago

nstogner commented 6 months ago

There are many reasons a backend model server can fail to serve a request. If lingo adds retries on failure it could improve the overall reliability of the system.

alpe commented 6 months ago

This can easily create exponential load. Can you share some scenarios that make sense for you?

samos123 commented 6 months ago

the scenarios I encountered so far were:

The main use case would be for providers that run Lingo as a managed service or internally to internal end-users and need to minimize the amount of errors returned to their end-users

nstogner commented 6 months ago

It is possible we could do this an the ingress layer into the cluster as well.

The biggest source of 503s has been misconfigured termination grace periods on model backends (which can take a long time to process all of their pending requests - longer than the 30s default). This should be mostly solvable by making sure we have knobs turned correctly with max-in-flight and a gracious termination period.