Better concurrent request handling for model host address

substratusai / lingo

Lightweight ML model proxy and autoscaler for kubernetes

https://www.substratus.ai

Apache License 2.0

102 stars 6 forks source link

Better concurrent request handling for model host address #38

Closed alpe closed 6 months ago

alpe commented 7 months ago

Like in #36 the reconcile may be affected by external requests. This refactoring helps by reducing lock conflicts

Handle request context timeout
Optimise concurrent access in endpoints type by separating r/w lock and notification lock
Added some tests and Go doc

I have also added a benchmark that shows that the new rwlock is ~30% faster than before on my box. But this is all within ns and does not really matter:

new: BenchmarkEndpointGroup-12       7667690           154.6 ns/op
old: BenchmarkEndpointGroup-12       4968279           234.2 ns/op

The key benefit of this PR is handling request timeout

alpe commented 7 months ago

Thank you very much for the feedback! I applied some change but I think this needs some better testing before it can be merged. It is a much more complex beast than I thought initially

samos123 commented 7 months ago

Agree on the additional testing. Looks like the larger scale (300 concurrent requests) system test is catching some kind of issue: https://github.com/substratusai/lingo/actions/runs/7286882573/job/19856443938?pr=38#step:5:1098

I can help with additions to the system tests if you have specific scenarios that should be tested. The original system tests were a result of concurrent request handling being broken and needing to ensure scale up and scale down works as expected and request and responses are being returned for a realistic backend.

Edit: I've triggered a re-run of the system tests to ensure it wasnt just a flaky test.