We're trying to settle on a single implementation that this project will work on to extend to support LLMServerPool as a Gateway API backend. This will enable us to run e2e tests against these concepts and iterate more quickly. That implementation should be:
An existing conformant implementation of Gateway API
Part of CNCF
Envoy-based for simplicity of extension mechanisms
Open to contributions from us to support this new type of backend
Upon creation of an LLMServerPool the controller creates: An ext-proc deployment/service. An original_dst cluster.
Upon creation of an HTTPRoute with an LLMServerPool as a backendRef: A Listener that routes requests to the appropriate original_dst cluster (there may be multiple LLMServerPools), and configure ext_proc to operate on requests sent to this cluster
Our current Envoy integration relies on
EnvoyExtensionPolicy
andEnvoyPatchPolicy
this is very manual, and not sustainable. (See: https://github.com/kubernetes-sigs/llm-instance-gateway/pull/18)We're trying to settle on a single implementation that this project will work on to extend to support LLMServerPool as a Gateway API backend. This will enable us to run e2e tests against these concepts and iterate more quickly. That implementation should be:
We propose extending existing this gateway implementation to act as the controller for the
LLMServerPool
object. (See: https://github.com/kubernetes-sigs/llm-instance-gateway/blob/main/docs/proposals/002-api-proposal/proposal.md#llmserverpool). As well as updatingHTTPRoute
to support aLLMServerPool
as a backendRef.At a high level we expect this to look like: