Open klopfdreh opened 1 year ago
I found the issue - it is when you scale up the instances in kubernetes to 2 and both servers are exporting the metrics at the same name
management:
metrics:
tags:
application: myservername
I got the dasbhoard from here: https://grafana.com/grafana/dashboards/9933-streams/ and this might be changed so that the application is check that it starts with a pattern so that you can name the application with myservername-1
and myservername-2
or myservername-randomidentifier
The dashboard should be adjusted so that it use =~
in the metrics.
Variable Value: SERVER_APPLICATION_NAME=myservername.*
(the .*
is important to match all pods)
Env-Variable: MY_POD_NAME
= myservername-3h35f2t3d-rcg8d
Example:
"expr": "process_uptime_seconds{application=~\"${SERVER_APPLICATION_NAME}\"}",
application.yml
management:
metrics:
tags:
application: ${MY_POD_NAME}
SCDF deployment env
-variables:
- name: MY_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
Hope this helps for a kubernetes setup with more than 1 replica. 👍
Other than that you could create selection to choose between the servers in the dashboard.
We could implement @klopfdreh suggested fix (or something similar) in:
I am not sure what is involved in 2nd item.
Description: Currently there is an issue with the Prometheus metrics of SCDF server. For example
http_server_requests_seconds_max
of any path is showing the value 0.0 even if I navigate through the UI.Release versions: 2.10.3
Custom apps:
Steps to reproduce: Setup
spring-cloud-dataflow-server
withprometheus-rsocket-proxy
and see/metrics/connected
endpoint.Screenshots:![image](https://github.com/spring-cloud/spring-cloud-dataflow/assets/980773/3a6cf4ee-a9da-4ec6-a687-e4a774efe109)
Note: We created our own artifact. based on
https://github.com/spring-cloud/spring-cloud-dataflow/tree/v2.10.3/spring-cloud-dataflow-server
That is the reason why there is a 1.0.63 mentioned.Additional context: The metrics are provided, but the count somehow is not working.
This is a Spring Boot standard metric, so I guess there is something broken in 2.7.x