spring-cloud / spring-cloud-dataflow

A microservices-based Streaming and Batch data processing in Cloud Foundry and Kubernetes
https://dataflow.spring.io
Apache License 2.0
1.09k stars 577 forks source link

Grafana Prometheus streams dashboard shows incorrect values when using multiple server instances #5357

Open klopfdreh opened 1 year ago

klopfdreh commented 1 year ago

Description: Currently there is an issue with the Prometheus metrics of SCDF server. For example http_server_requests_seconds_max of any path is showing the value 0.0 even if I navigate through the UI.

Release versions: 2.10.3

Custom apps:

Steps to reproduce: Setup spring-cloud-dataflow-server with prometheus-rsocket-proxy and see /metrics/connected endpoint.

Screenshots: image

Note: We created our own artifact. based on https://github.com/spring-cloud/spring-cloud-dataflow/tree/v2.10.3/spring-cloud-dataflow-server That is the reason why there is a 1.0.63 mentioned.

Additional context: The metrics are provided, but the count somehow is not working.

This is a Spring Boot standard metric, so I guess there is something broken in 2.7.x

klopfdreh commented 1 year ago

I found the issue - it is when you scale up the instances in kubernetes to 2 and both servers are exporting the metrics at the same name

management:
  metrics:
    tags:
      application: myservername
klopfdreh commented 1 year ago

I got the dasbhoard from here: https://grafana.com/grafana/dashboards/9933-streams/ and this might be changed so that the application is check that it starts with a pattern so that you can name the application with myservername-1 and myservername-2 or myservername-randomidentifier

klopfdreh commented 1 year ago

The dashboard should be adjusted so that it use =~ in the metrics.

Variable Value: SERVER_APPLICATION_NAME=myservername.* (the .* is important to match all pods) Env-Variable: MY_POD_NAME = myservername-3h35f2t3d-rcg8d

Example:

"expr": "process_uptime_seconds{application=~\"${SERVER_APPLICATION_NAME}\"}",

application.yml

management:
  metrics:
    tags:
      application: ${MY_POD_NAME}

SCDF deployment env-variables:

            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
klopfdreh commented 1 year ago

Hope this helps for a kubernetes setup with more than 1 replica. 👍

klopfdreh commented 1 year ago

Other than that you could create selection to choose between the servers in the dashboard.

onobc commented 1 year ago

We could implement @klopfdreh suggested fix (or something similar) in:

  1. Dashboard(s) we provide in SCDF repo
  2. Dashboard(s) in Grafana labs (https://grafana.com/grafana/dashboards/9933-streams/)

I am not sure what is involved in 2nd item.