hf-tgi custom runtime untime fixes to enable metrics and avoid warning logs

rh-aiservices-bu / llm-on-openshift

Resources, demos, recipes,... to work with LLMs on OpenShift with OpenShift AI or Open Data Hub.

Apache License 2.0

74 stars 71 forks source link

hf-tgi custom runtime untime fixes to enable metrics and avoid warning logs #35

Closed dagrayvid closed 5 months ago

dagrayvid commented 5 months ago

This PR enables metrics by switching HF TGI to use port 3000. Currently a Service and ServiceMonitor is created for the IBM/TGIS runtime which exposes metrics on port 3000. We can make Prometheus scrape HF TGI metrics by using the same port in this custom runtime.

Also added a note about safetensor format in the README, and added some ENV vars to avoid silence some warning logs.