TGI and TEI services enabled their metrics enpoints only after processing their first request. I.e. verifying that that HPA can access the relevant custom metrics, requires uploading doc(s) with data-prep so that reranking is used, and doing at least one ChatQnA query. Then HPA can be verified to be working:
https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/HPA.md#verify
@daisy-ycguo CI needs Prometheus for testing
hpa-values.yaml
: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/HPA.md#prometheusOtherwise Helm install fails due to missing
serviceMonitor
and custom metrics Kubernetes APIs.As to testing that the HPA rules actually work, there's a manual step needed for installing the required custom metric config. That part can be scripted in few lines, shown here: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/HPA.md#post-install
TGI and TEI services enabled their metrics enpoints only after processing their first request. I.e. verifying that that HPA can access the relevant custom metrics, requires uploading doc(s) with data-prep so that reranking is used, and doing at least one ChatQnA query. Then HPA can be verified to be working: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/HPA.md#verify