Possible to change `http-usermetric` Port?

robertgshaw2-neuralmagic commented 4 months ago

Ask your question here:

Hello! I am working on an integration between Kserve/Knative with vLLM for deploying LLMs. vLLM is a production inference server for LLMs, and I have instrumented it with Prometheus metrics that are specific to LLM serving. For instance, the key items include TTFT (time-to-first-token) and TPOT (time-per-output-token). I want to use these metrics in addition to the generic metrics exposed by the queue-proxy container.

KServe has a feature called qpext, which enables aggregation of the queue-proxy container metrics with the vllm container metrics. qptext exposes the aggregated metrics on port 9088 and exposes the queue-proxy metrics on port 9091. The issue I am running into is that when I create my InferenceService (which uses Knative Serving), only port 9091 is exposed (this port is named http-usermetric):

NAME                                      TYPE           CLUSTER-IP       EXTERNAL-IP                                            PORT(S)                                              AGE
tinyllama                                 ExternalName   <none>           knative-local-gateway.istio-system.svc.cluster.local   <none>                                               8m34s
tinyllama-predictor                       ExternalName   <none>           knative-local-gateway.istio-system.svc.cluster.local   80/TCP                                               8m35s
tinyllama-predictor-00001                 ClusterIP      10.125.177.215   <none>                                                 80/TCP,443/TCP                                       8m55s
tinyllama-predictor-00001-private         ClusterIP      10.125.180.69    <none>                                                 80/TCP,443/TCP,9090/TCP,9091/TCP,8022/TCP,8012/TCP   8m56s

As a result, when I create a ServiceMonitor to monitor my InferenceService, I am unable to query port 9088 where the vLLM metrics are aggregated with the queue-proxy metrics.

I am going to proceed by using PodMonitor for the time being, but I would prefer to use a ServiceMonitor as this seems like best practice after my review of the Prometheus Operator documentation.

So my question is:

Is there any way to change the http-usermetrics port that is exposed by the KNative services?
If not, is using Podmonitor best practices for monitoring user-defined metrics from applications inside Knative?

Apologies if this is the wrong place to ask this. I was not quite sure whether this made more sense to ask in the KServe or KNative forums.

skonto commented 4 months ago

Hi @robertgshaw2-neuralmagic I think the right place to ask this is KServe community. In the meantime here is my understanding. When qptext gets a request on 9088, it combines metrics from 9091 and the app port(vllm runtime for this case) and returns the aggregated metrics.

Now, you could create your own K8s service and point to the aggregated port. The service you are referring to above is an internal Knative service and only exposes the 9091 port. This does not stop you from exposing metrics to some other port and scrape it independently with a service monitor. You could do the same with port 9088.

Btw 9088 is the qptext aggregation port, what is the vllm runtime port from which the qptext will get the metrics (is it the default 8080)? Have you test if those ports work within the container, do you get any metrics back? Another question is whether you are using Istio or not as the latter provides metrics aggregation as it affects the setup.

robertgshaw2-neuralmagic commented 4 months ago

Thanks @skonto - this is very helpful. I am somewhat new to Knative/KServe so I am trying to learn best practices for creating additional services vs updating configs of Knative/Kserve.

The vllm runtime uses port 8000 for both metrics and user interaction API. I am going to change this.

Right now I have setup using Istio for client connections from outside the cluster. Since the Prometheus server is running inside my cluster, I was not going though Istio for metrics aggregation.

Would you suggest I use Istio for scraping the prom metrics as well?

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

knative / serving

Possible to change `http-usermetric` Port? #15223

Ask your question here: