rancher prometheus monitoring integration

borgez commented 3 years ago

What would you like to be added: rancher monitoring integration

Why is this needed: lens not autodiscover rancher metrics

Environment you are Lens application on:

Kubernetes distribution: Rancher
Desktop OS: Windows

piersdd commented 3 years ago

+1

gigo1980 commented 3 years ago

+1

dorellang commented 3 years ago

Any workaround to get this working?

df-cgdm commented 3 years ago

+1

meanevo commented 3 years ago

+1

Change Prometheus service address to cattle-prometheus/access-prometheus:80 may provides you some information (incomplete).

nrccua-bryancusatis commented 3 years ago

+1

steebchen commented 3 years ago

I can confirm using cattle-prometheus/access-prometheus:80 provides some information:

The main cluster master/worker cpu/memory dashboard does not work

and neither does the nodes overview:

However, individual selections of single resources (pods, deployments, replicasets, daemonsets, statefulsets) works:

The individual selection of nodes just shows requests & limits but not the actual usage.

This is already super amazing though. I love this tool! I think a "rancher" preset would probably make sense in the future :)

steebchen commented 3 years ago

Actually, it depends what default you choose - there seem to be difference between "Helm" and "Prometheus Operator", there are some things not working with one and some other things are not working with the other. I'd be happy to provide exact results what works with what if this would be needed.

jiminHuang commented 3 years ago

+1

maxisam commented 3 years ago

@steebchen I can confirm it works on v2 as well

for V2 it is cattle-monitoring-system/rancher-monitoring-prometheus:9090

It is kinda weird that it doesn't show all information for V2, because it seems to me v2 is pretty much just like a pre-installed Prometheus. I wonder if we need to add lens specific metric

https://github.com/lensapp/lens/blob/master/extensions/metrics-cluster-feature/resources/02-configmap.yml.hb

Nokel81 commented 3 years ago

Related to #1865

houshym commented 3 years ago

@maxisam I changed it but it does not work. Rancher v 2.5.7 cattle-monitoring-system/rancher-monitoring-prometheus:9090 did you choose helm or prometheus operator?

maxisam commented 3 years ago

@houshym with the latest lens and 2.5.7 It seems like I don't need to do anything. Just pick auto and it works. (most of it)

houshym commented 3 years ago

@maxisam lens version 4.2.0, Rancher 2.5.7 and it does not work, fresh install k8s

maxisam commented 3 years ago

@houshym Mine is on 1.19.8 and I can see

nitrogear commented 3 years ago

Can someone from developers explain what Prometheus metrics are needed to display graphs for CPU/memory in Cluster view?

Nokel81 commented 2 years ago

@nitrogear I would recommend looking at the files within https://github.com/lensapp/lens/tree/master/src/main/prometheus for the queries we use.

McFistyBuns commented 2 years ago

Rancher: v2.6.0 rancher-monitoring chart: v100.0.0+up16.6.0

Now, I am completely green when it comes to Rancher, Kubernetes, and Prometheus, but I'm persistent and decided to do some digging. It appears that the default rancher-monitoring chart does not set the node: label in the node-exporter configuration. Looking over the queries that lens uses, it looks like it expects a few node functions to return that label: node_memory_MemTotal_bytes and node_cpu_seconds_total and probably something I missed. Looking through the rancher-monitoring chart values.yaml I notice that there was a nodeExporter: relabelings: section that was commented out that had a __meta_kubernetes_pod_node_name renaming. So, I thought, what the heck, let's set that and see what happens. After a bit of back and forth with it, I got it to work.

nodeExporter:
  relabelings:
    - sourceLabels: [__meta_kubernetes_pod_node_name]
      separator: ;
      regex: ^(.*)$
      targetLabel: node
      replacement: $1
      action: replace

Then, choosing prometheus-operator and setting the PROMETHEUS SERVICE ADDRESS to cattle-monitoring-system/rancher-monitoring-prometheus:9090 seems to make everything work.

I would love someone to verify this, though, because I went about changing a lot of different chart values before I found this one. It's possible I left one of those old changes in and that's what made this work. Screen Shot 2021-12-14 at 12 33 40 PM

maxisam commented 2 years ago

@McFistyBuns Thanks for sharing! that is cool!

maxisam commented 2 years ago

@McFistyBuns I finally have chance to test it. Unfortunately, it doesn't work for me. I still missed couple things like CPU and Disk

kimmornetum commented 2 years ago

I think I solved this. Lets look for example at this: https://github.com/lensapp/lens/blob/master/src/main/prometheus/operator.ts#L53 And when you look at what variable rateAccuracy, it is set to 1m. Furthermore the default scrape interval is 1 minute. So the query will return empty result because I believe rate function needs at least 2 data point to work.

Armed with this knowledge I set the nodeExporter interval to 30 seconds. The resulting section looks like this:

nodeExporter:
  enabled: true
  jobLabel: jobLabel
  serviceMonitor:
    interval: 30s
    metricRelabelings: null
    proxyUrl: ''
    relabelings:
      - action: replace
        regex: ^(.*)$
        replacement: $1
        separator: ;
        sourceLabels:
          - __meta_kubernetes_pod_node_name
        targetLabel: node
    scrapeTimeout: ''

McFistyBuns commented 2 years ago

You beat me to it. I just got back in to the office after the holidays and was going to mention I forgot I had changed that as well.

maxisam commented 2 years ago

Thanks @McFistyBuns @kimmornetum , I can confirm it work with rancher-monitoring:9.4.203 on Rancher 2.5.8

jnicpon commented 1 year ago

Does anyone have a working config for Lens 5.5.4 and Rancher Server 2.6.6 with rancher-monitoring 100.1.2+up19.0.3? I can't seem to get memory under nodes to show any data, though it shows up under pods. Also, I can't get the Cluster Dash to populate 'in use' nodes.

gunzy83 commented 1 year ago

This seems to be a problem for the basic helm install for prometheus which defaults to a 60s interval. Reducing to a 30s interval works for the reasons @kimmornetum mentioned above though I was confused why it was not working because https://github.com/lensapp/lens/blob/master/src/main/prometheus/helm.ts has the rateAccuracy set to 5m. My typescript knowledge is a little lacking but it seems that readonly is not overriding the value in the parent class for getQuery as the author of the code may have intended.

lensapp / lens

rancher prometheus monitoring integration #919