substratusai / lingo

Lightweight ML model proxy and autoscaler for kubernetes
https://www.substratus.ai
Apache License 2.0
102 stars 6 forks source link

remove flakiness from scale test #27

Closed samos123 closed 8 months ago

samos123 commented 8 months ago

This will reduce flakiness of the system tests. Ocasionally the system test won't scale to 3 replicas because it's too fast at processing 500 requests.

So instead the deployment is patched to wait for 10 seconds before starting the main container. This causes the queue to be build up large enough and cause it to scale up to 3 replicas.

nstogner commented 8 months ago

I would prefer that we remove this portion of the system test. This is covered under the integration test where we have control over the backend's response time so it is not by nature a race condition.

samos123 commented 8 months ago

I think it's important we keep this to ensure scaling back to 0 and then scaling to more than 1 replica works in a real environment with multi threading and python clients creating many openai clients in parallel. It would have prevented an issue and the integration test didn't catch it before.