Open xgengsjc2021 opened 2 years ago
We don't actively add them given this is up to the end-user; we don't want to enforce things on end-users.
However, if you have suggestions we do welcome PRs where we can add them, but commented out.
@tomkerkhove Where can I see all of metrics KEDA provided? I could not find any metrics on keda website, so I dont know what metrics I can use to setup alerts.
The overview is available on https://keda.sh/docs/2.7/operate/prometheus/
@tomkerkhove Thanks. When I check "http://127.0.0.1:9022/metrics", I only get one metric. (However, in the document, they list 4 metrics. ) Please see my screenshot.
Why does KEDA only list one metric here? Btw, I am using the latest version of KEDA(2.7.2).
From the screenshot, you can see the metric name is: keda_metrics_adapter_scaler_errors_total (In the document, the metric name is keda_metrics_adapter_scaler_error_totals) From my metric, I can get result, but if I queried keda_metrics_adapter_scaler_error_totals, I get nothing.
Besides this, I also tried to query three other metrics below on Prometheus, did not get any result.
keda_metrics_adapter_scaled_object_error_totals
keda_metrics_adapter_scaler_errors
keda_metrics_adapter_scaler_metrics_value
My KEDA setting for Prometheus
prometheus:
metricServer:
enabled: true
port: 9022
portName: metrics
path: /metrics
podMonitor:
# Enables PodMonitor creation for the Prometheus Operator
enabled: true
interval:
scrapeTimeout:
namespace: monitoring
additionalLabels:
release: kube-prometheus-stack
relabelings: []
operator:
enabled: true
port: 8080
path: /metrics
podMonitor:
# Enables PodMonitor creation for the Prometheus Operator
enabled: true
interval:
scrapeTimeout:
namespace: monitoring
additionalLabels:
release: kube-prometheus-stack
Is there anything I need to modify? Please help check. Thanks
@tomkerkhove Hi, do you have any idea about what I commented above?
This might be a silly question, but does your cluster have any ScaledObject resources? Because if it does not, then that might explain why they are missing.
(sorry for the slow response)
@tomkerkhove Thanks but the reply made me confused. I do have scaledobject resources in my env.
That is odd, can this be related to https://github.com/kedacore/keda/issues/3554 @JorTurFer ?
I don't think so, that issue registers the metric with 0 as value, but the metric is registered. are you checking the metrics server (not the operator) in the port 9022, right?
@JorTurFer @tomkerkhove Thanks for the response. I did check the metrics server, (not the operator). Btw, from the output below, in my keda ns, I only see on service, keda-operator-metrics-apiserver
kubectl -n keda get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
keda-operator-metrics-apiserver ClusterIP 172.20.76.175 <none> 443/TCP,80/TCP,9022/TCP 147d
Then I run port-forward to check the metrics on metrics-apiserver
kubectl -n keda port-forward svc/keda-operator-metrics-apiserver 9022
Forwarding from 127.0.0.1:9022 -> 9022
Forwarding from [::1]:9022 -> 9022
Handling connection for 9022
Handling connection for 9022
hum weird... It's possible that after metrics server restart, the counters don't exist because they are created during the first access. Do you have more than 1 metrics server instance? If you restart the metrics server and wait some minutes (having SOs), don't you have any metric either?
@JorTurFer I only have 1 metrics server pod as you can see below.
kubectl -n keda get pods
NAME READY STATUS RESTARTS AGE
keda-operator-848b9f56f7-5szlp 1/1 Running 0 27d
keda-operator-metrics-apiserver-5cb9fd7947-gv47p 1/1 Running 0 19m
Based on your suggestion, I rebooted metric pod, then checked the http://127.0.0.1:9022/metrics, got the same result
# HELP keda_metrics_adapter_scaler_errors_total Total number of errors for all scalers
# TYPE keda_metrics_adapter_scaler_errors_total counter
keda_metrics_adapter_scaler_errors_total 0
really weird... Could you query some metric manually to ensure that at least one trigger is executed? I have just seen in your picture that you are using CPU trigger, that trigger is processed by the Kubernetes metrics server (not by KEDA metrics server) and that's why you can't see any other metric. Do you have any trigger which is not CPU/Memory? Could you query it manuall?
@JorTurFer At this moment, we only monitor CPU and Mem. I dont have any other triggers for now. Actually, we are using the combination of CPU+Mem together as trigger in our env for now.
Here is a question hope you can answer: Once the condition got matched in KEDA, will keda scale up the pods to the maximum number at once? I noticed one time, my CPU usage is not too high(It was above the threshold), but it scaled up to the maximum pods immediately, which we dont like it.
@JorTurFer At this moment, we only monitor CPU and Mem. I dont have any other triggers for now. Actually, we are using the combination of CPU+Mem together as trigger in our env for now.
That's why you can't see any other metric, because they are not generated yet due to KEDA metrics server hasn't received any query, all the requests are done against Kubernetes metric server. When you use CPU/Memory scaler, KEDA basically create a "regular" HPA hitting to the "regular" metrics server (that's why Kubernetes metrics server is needed)
Here is a question hope you can answer: Once the condition got matched in KEDA, will keda scale up the pods to the maximum number at once? I noticed one time, my CPU usage is not too high (It was above the threshold), but it scaled up to the maximum pods immediately, which we dont like it.
KEDA creates the HPA and exposes the metrics (except CPU and memory) and is the HPA Controller who manages the autoscaling,, so basically we don't have any change there. Why do you think that the CPU usage was low? I mean, do you have all the usage monitored? Another important thing is that the threshold is not a boundary, it's the desired value. I mean, the HPA Controller will try to be closest as possible to that value, not scaling out/in automatically when the value changes.
Remember also that HPA Controller is really aggressive scaling out and very conservator scaling in, a small peak could trigger the scaling out and several minutes are needed to scaling in. Using KEDA you can customize this default behaviour using advanced
section in the ScaledObject.
I found there is zero rules in the values. yaml. Can you provide some rules for promtheus monitoring purpose? Can you provide more rules? Is this one enough in values.yaml?