solacecommunity / solace-prometheus-exporter

Prometheus Solace Exporter
MIT License
25 stars 23 forks source link

Discussion: Howto implement exclusion filters for (queue) metrics #59

Closed andreas-habel closed 7 months ago

andreas-habel commented 9 months ago

Hi Solace Community,

we're using the Prometheus exporter very successfully, but since we get more and more assets on our brokers we see timeouts and gaps in our metrics. Error messages like Can't scrape QueueDetailsSemp1 ..... context deadline exceeded show up in the logs. One reason surely is, that on one broker we do have +4500 Queues, which is obviously too much and causes timeouts. Also additionally we're afraid if gathering those metrics every minute may harm the broker.

In theory, it would be fine if we could filter the gathered queue metrics, because roughly 50% of them are not as time critical - we could export those metrics less often. However, the exporter does not support blacklisting queue names with a ! or similar. There just is no option to do something like this: http://your-exporter:9628/solace?m.QueueStats=myVpn|!ARBON And using whitelisting instead would require us to add / change the query constantly.

So basically a feature request would be to be able to exclude metrics gathering for queues that start with a specific name/string.

Of course this is not even problem of the exporter, since it just forwards the filter to the the SEMP API

<rpc><show><queue><name>" + itemFilter + "</name><vpn-name>" + vpnFilter + "</vpn-name><stats/><count/><num-elements>100</num-elements></queue></show></rpc>

Because I currently see no good solution how this could be implemented I wanted to create this post for a potential discussion.

How did Solace solve that with Datadog and Insights? Do you have similar challenges or Issues?

andreas-habel commented 9 months ago

One Idea I came up with does not solve the "too much queue metrics" issue, but at least could prevent overload on the broker by introducing a rate limiting on outgoing SEMP requests. One library that would potentially help here could be https://pkg.go.dev/golang.org/x/time/rate#Limiter.Wait This of course would slow down overall scraping performance and requires increased timeouts on prometheus.

Digging deeper into potential request limits I found 2 different numbers which are bit confusing

1.) SEMP v1 Polling Frequency Guidelines

The event broker supports a maximum SEMP v1 polling rate of ten requests per second. https://docs.solace.com/Admin/SEMP/Using-Legacy-SEMP.htm#SEMP_v1_Polling_Frequency_Guidelines

2.) SEMP v1 Request Over HTTP Service

The event broker supports a maximum of 500 concurrent SEMP v1 requests over HTTP sessions. https://docs.solace.com/Admin/SEMP/Using-Legacy-SEMP.htm#SEMP_v1_Request_Over_HTTP_Service

Does that mean, there could be up to 500 simultaneous requests open, but not more then 10 newly opened per second?

GreenRover commented 9 months ago

The sempv1 api only returns 100 items at once. And paging is required to get all results. But paging at solace is implemented with result pointer not with page numbers, There for parallelize the paging results is not as easy as expected.

But you can use Prometheus to parallelize. Just scrape

PS asked solace to check to may be provide a better filter possibility for sempv1 or semp2 at this point.

GreenRover commented 9 months ago

Would the sempv2 filter may be fit your needs? https://docs.solace.com/Admin/SEMP/SEMP-Features.htm#Filtering

andreas-habel commented 9 months ago

Would the sempv2 filter may be fit your needs? https://docs.solace.com/Admin/SEMP/SEMP-Features.htm#Filtering

Yes, sure. I think the SEMP/v2/monitor/msgVpns/{vpn}/queues endpoint with it's where and select options would be way more flexible then the legacy SEMP V1 API. Ist there any reason why the exporter still uses SEMPv1 besided that just no one changed it? ;) Is there more overhead on the broker expected?

GreenRover commented 9 months ago

yes semp v2 was slower last time i reviewed. But i know solace improved in meanwhile.

Next question do you like to provide an endpoint using v2 or do you wait for me to do this (may be i will find the time this year)

andreas-habel commented 9 months ago

I would not complain if I can just grab a new exporter version with the new endpoint, but of course we're willing to contribute as well. Just to get on the same track - you're suggestion is to provide new datasources like VpnStatsV2 and QueueStatsV2 which allow for more advanced filtering. But they should still return the same metrics so that dashboards and alerts must not be changed. That also means, that using QueueStats and QueueStatsV2 datasources in the same endpoint definition should be prevented.

[endpoint.new-sempv2]
VpnStatsV2=default|*
QueueStatsV2=default|queueName!=exclude/those/queues/*

This option would be the most flexible I guess and would allow slow transition for users. In theory this would even allow to define which metrics should be exported by extending the filtering options with the select parameter.

[endpoint.new-sempv2-with-select]
QueueStatsV2=default|queueName!=exclude/those/queues/*|bindRequestRate,bindSuccessCount

Other option would be to have a global "useSempV2" flag, that would automatically use SEMPv2 for every datasource where it is available

func (e *Exporter) Collect(ch chan<- prometheus.Metric) {
...
case "QueueStats":
   if e.config.useSempV2 {
      up, err = e.semp.GetQueueStatsSemp2(ch, dataSource.VpnFilter, dataSource.ItemFilter)
   } else {
      up, err = e.semp.GetQueueStatsSemp1(ch, dataSource.VpnFilter, dataSource.ItemFilter)
   }
GreenRover commented 9 months ago

I like the new v2 datasources more. Because i fear that sempv2 and sempv1 powered endpoints will provide totally the same metrics.

The idea with parametrize able select options is very good. But i here we should be able to enter metric name and map it in exporter back to the sempv2 field name.

To coordinate, the first of us having time to work on this issue just post here to avoid doubled work

GreenRover commented 9 months ago

I started work here: https://github.com/solacecommunity/solace-prometheus-exporter/pull/60

GreenRover commented 9 months ago

I implemented queue stats with v2 api. Please test if it fits the requirements / improves the situation: https://hub.docker.com/r/solacecommunity/solace-prometheus-exporter/tags?page=1&name=testing

JanGrosse commented 9 months ago

Hi @GreenRover,

thanks for implementing this so fast 💯

I tried to test it today, but sadly failed. Pretty sure I am doing something wrong on my end. I updated the image, added the environment variable for SOLACE_DEFAULT_VPN and updated my configmap. It now contains the following new endpoint:

[endpoint.solace-esb-sempv2]
QueueStatsV2=pksolocp01|queueName%21%3Desb%2Fq%2Fkfl%2Fsib%2Fdata%2Fin*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded

(I also changed and tried with / without URL encoding)

But this sends the pod into a CrashLoopBackOff with the following error

ts=2023-12-12T11:59:52.240Z caller=solace_prometheus_exporter.go:77 level=error msg="exactly one \"|\" expected at endpoint \"solace-esb-sempv2\". Found key \"QueueStatsV2\" value \"pksolocp01|queueName%21%3Desb%2Fq%2Fkfl%2Fsib%2Fdata%2Fin*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded\". Expecected: VPN wildcard | item wildcard"
GreenRover commented 9 months ago

I fixed this issue now. Please test again.

I performed a performance test as well:

  1. I created 9000 queues named "not/interesting/{i}"
  2. As well as 90 queues named "important/queue/{i}"

Started the exported like this:

kind: ConfigMap
apiVersion: v1
metadata:
  name: solace-exporter-config
data:
  solace_prometheus_exporter.ini: |
    [solace]
    listenAddr = 0.0.0.0:1920
    enableTLS = false
    scrapeUri = https://mr-connection-j6em7yuyi7k.messaging.solace.cloud:943
    username = monitoring
    password = monitoring
    defaultVpn = AaaBbbCcc
    timeout = 60s
    sslVerify = false

    [endpoint.solace-v1-test]
    QueueStats = AaaBbbCcc|*

    [endpoint.solace-v2-test]
    QueueStatsV2 = AaaBbbCcc|queueName!=not/interesting*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded
---

kind: Deployment
apiVersion: apps/v1
metadata:
  name: solace-exporter
  labels:
    app: solace-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: solace-exporter
  template:
    metadata:
      labels:
        app: solace-exporter
      annotations:
        collectord.io/logs-output: devnull
    spec:
      volumes:
        - name: solace-exporter-config-volume
          configMap:
            name: solace-exporter-config
            defaultMode: 420
      containers:
        - resources:
            limits:
              cpu: 300m
              memory: 800M
            requests:
              cpu: 150m
              memory: 500M
          readinessProbe:
            httpGet:
              path: /metrics
              port: 1920
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 7
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 2
          terminationMessagePath: /dev/termination-log
          name: solace-exporter
          command:
            - /solace_prometheus_exporter
            - '--config-file=/etc/solace/solace_prometheus_exporter.ini'
          livenessProbe:
            httpGet:
              path: /metrics
              port: 1920
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 7
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 6
          env:
            - name: SOLACE_LISTEN_ADDR
              value: '0.0.0.0:1920'
            - name: SOLACE_LISTEN_TLS
              value: 'false'
            - name: TZ
              value: UTC
          ports:
            - containerPort: 1920
              protocol: TCP
          imagePullPolicy: Always
          volumeMounts:
            - name: solace-exporter-config-volume
              readOnly: true
              mountPath: /etc/solace
          image: 'docker.bin.sbb.ch/solacecommunity/solace-prometheus-exporter:latest'
      restartPolicy: Always

---

kind: Service
apiVersion: v1
metadata:
  name: solace-exporter
spec:
  ipFamilies:
    - IPv4
  ports:
    - name: 1920-tcp
      protocol: TCP
      port: 1920
      targetPort: 1920
  internalTrafficPolicy: Cluster
  type: ClusterIP
  ipFamilyPolicy: SingleStack
  sessionAffinity: None
  selector:
    app: solace-exporter

---

kind: Route
apiVersion: route.openshift.io/v1
metadata:
  name: solace-exporter
spec:
  to:
    kind: Service
    name: solace-exporter
  tls:
    termination: edge
  host: solace-exporter.apps.trs01t.sbb-aws-test.net
  port:
    targetPort: 1920-tcp

Result:

Query "solace-v2-test" took 13.84 sec Query "solace-v1-test" took 9.99 sec

What your need to know about solace semp v2. There "where" works not as expected. Compared to SQL it compares more like a "having". Meaning it only filters what is returned to the client, but here for solace exporter needs walk throug all the pages in my case 9090/100 == 91 SempV2 and/or SempV1 query to get the result set.

GreenRover commented 9 months ago

For queue stats there is got another idea but this make the exporter having a state what i tried to avoid, because i use it centralized and horizontal scalled.

But here the idea (i am not shure about i like it)

Having a thread getting the list of all queues for example every 5 minutes. Filter this list by regex for queues of interest.

On metric call ask semp for the stats of the particular queues. This can easy be paralleled.

But this feature is so specialized that it only should be used unter this conditions:

JanGrosse commented 9 months ago

Thank you for the updated version. So, the endpoint is now working correctly, but we are still running into a timeout before the request finishes. I also increased the timeout to 60s, I also increased the timeout on the route in openshift to 60s, but it seems to take longer in the end. Not sure how far I should spin this time up, seems like the broker is already under heavy load if he is not able to finish the request in 60s. Compared to your exampel with 9000 queues we have less on our broker but ofc a lot of messaging going on.

We will try to reduce the amount of queues and try to identify if something on the brokers is running wild which might also slow down the requests.

GreenRover commented 9 months ago

In my load test i took a 10k solace cloud broker (4cores) with no load on it. As soon as queues are in use this will slow down dramatically.

Can you provide:

Another possible solution would be:

Having a config file like:

  [solace]
  listenAddr = 0.0.0.0:1920
  enableTLS = false
  scrapeUri = https://mr-connection-j6em7yuyi7k.messaging.solace.cloud:943
  username = monitoring
  password = monitoring
  defaultVpn = AaaBbbCcc
  timeout = 60s
  sslVerify = false
  prefetchInterval= 60s // NEW Option: the sleep time between the prefetches
 [endpoint.solace-v2-test]
 QueueStats =AaaBbbCcc|queueName!=not/interesting*

As soon as this options is configured.

than prometheus should be configured to use "honor_timestamps". But be aware of that Prometheus may warn if the prometheus see 2 times same values. This is not a problem but can result in noisy Prometheus logs.

JanGrosse commented 9 months ago

We are using a 100k Broker (8 cores max) with a lot of load on it :-)

how much queues are exist on the broker

Around 4500

how much queues are you interested in?

Around 1500

what is the target scrape interval?

It was in the past 60s, but since we do not even finish requests now in this time might go up to 300s

Did you try to call the exporter endpoint with postman / curl to check how long it takes?

Just tried that - took a whooping 105 seconds to finish.

I assume that 1500 is still to big of an amount. And if I understood you correct even the where filter from the sempv2 does not help much here as it still has to go through all 4500 queues first and filter always on the groups of 100 it fetches at a time.

GreenRover commented 9 months ago

Ok that is an average of 2.1sec per semp call. If possible please try to add more cpu cores to your broker. Because this is a very long time.

Do you use a dedicated exporter per broker? If yes i would try to provide async scraping. There are optimizations possible like:

GreenRover commented 9 months ago

Please take newest "testing" build with this config:

[solace]
timeout=5s
# 0s means disabled. When set an interval, all well configured endpoints will fetched async.
# This may help you to deal with slower broker or extreme amount of results.
prefetchInterval=30s

# Because a solace advices to have less than 10 semp requests per second . This is a very rare resources that needs to be take care of.
parallelSempConnections=1

[endpoint.solace-esb-sempv2]
QueueStatsV2=pksolocp01|queueName%21%3Desb%2Fq%2Fkfl%2Fsib%2Fdata%2Fin*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded

[endpoint.solace-esb-sempv1]
QueueStatsV1=pksolocp01|*

And run with the argument: --log.level=debug

I would expect to see a log like:

ts=2023-12-15T16:53:40.616Z caller=solace_prometheus_exporter.go:107 level=info msg="Register handler from config" handler=/solace-esb-async dataSource="QueueStats=pksolocp01|*|"
ts=2023-12-15T16:53:40.616Z caller=asyncFetcher.go:28 level=debug msg="Fetching for handler" handler=/solace-esb-async
ts=2023-12-15T16:53:40.856Z caller=getQueueStatsSemp1.go:120 level=debug msg="Scrape QueueStatsSemp1" page=1 duration=239.7761ms
msg="Result of QueueStatsSemp2" results=0 page=13

This is just a first attempt to run everything async. Please provide me all durations for a single SempV1 and SempV2 scrape run. As well as the results per page for the semp v2 response. To see where he problem might be.

By incremental scraping is still a hard thing, because. While my tests, i have observed:

What we possibly can try:

JanGrosse commented 9 months ago

Hi, sorry for the delay in my answers and thank you for your continous effort on this topic. I managed today to run the new version - here is the log output I received while letting it run for ~ 3 minutes:

ts=2023-12-20T08:47:21.150Z caller=solace_prometheus_exporter.go:82 level=info msg="Starting solace_prometheus_exporter" version=1.004005e+06
ts=2023-12-20T08:47:21.150Z caller=solace_prometheus_exporter.go:83 level=info msg="Build context" context="(go=go1.20.12, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2023-12-20T08:47:21.150Z caller=solace_prometheus_exporter.go:95 level=info msg=Scraping listenAddr=http://0.0.0.0:9628 scrapeURI=https://pksolocp01.esb-messaging.svc.cluster.local:1943 username=esb_monitoring_user sslVerify=false timeout=1m0s
ts=2023-12-20T08:47:21.150Z caller=solace_prometheus_exporter.go:110 level=info msg="Register handler from config" handler=/solace-esb-sempv2 dataSource="QueueStatsV2=pksolocp01|queueName%21%3Desb%2Fq%2Fkfl%2Fsib%2Fdata%2Fin*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded"
ts=2023-12-20T08:47:21.150Z caller=solace_prometheus_exporter.go:110 level=info msg="Register handler from config" handler=/solace-esb-sempv1 dataSource="QueueStatsV1=pksolocp01|*|"
ts=2023-12-20T08:47:21.150Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T08:47:21.740Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=589.520234ms
ts=2023-12-20T08:47:21.741Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T08:47:21.741Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T08:47:51.769Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T08:47:51.769Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T08:47:52.060Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=291.046155ms
ts=2023-12-20T08:47:52.060Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T08:48:21.770Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T08:48:22.061Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T08:48:22.261Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=200.476985ms
ts=2023-12-20T08:48:22.261Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T08:48:51.783Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T08:48:52.262Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T08:48:52.531Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=269.230506ms
ts=2023-12-20T08:48:52.532Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T08:49:21.808Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T08:49:22.532Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T08:49:22.869Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=336.990853ms
ts=2023-12-20T08:49:22.869Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T08:49:51.823Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1

Yesterday evening I thought about another (maybe) option. I know that Solace offers also SEMP over message bus (executing show commands) described here - maybe this would decrease our execution time if we do it not through HTTP but rather through the message bus directly.

Already wishing you happy holidays and a good start into the year 2024 :-)

GreenRover commented 9 months ago

SempOver messaging is just an different transport protocol. The problem for you is the slow semp query. I see here no improvement potention.

What i think is strange at your log is: I see: Result of QueueStatsSemp2 takes 0.2 to 0.6sec what is not slow at all. But what confuses me is:

Didt you tried to add more cpu cores to the broker or asks solace how the performance might be improved? May be open a solace issue and use me as cc. (Andres is having my mail)

What do you use as

[solace]
timeout=180s
JanGrosse commented 9 months ago

Oh, I was using directly your config from the comment above with the 5s timeout. I increased it to 180s & this is the log output (seems similar)


ts=2023-12-20T09:16:40.176Z caller=solace_prometheus_exporter.go:82 level=info msg="Starting solace_prometheus_exporter" version=1.004005e+06
ts=2023-12-20T09:16:40.176Z caller=solace_prometheus_exporter.go:83 level=info msg="Build context" context="(go=go1.20.12, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2023-12-20T09:16:40.176Z caller=solace_prometheus_exporter.go:95 level=info msg=Scraping listenAddr=http://0.0.0.0:9628 scrapeURI=https://pksolocp01.esb-messaging.svc.cluster.local:1943 username=esb_monitoring_user sslVerify=false timeout=1m0s
ts=2023-12-20T09:16:40.176Z caller=solace_prometheus_exporter.go:110 level=info msg="Register handler from config" handler=/solace-esb-sempv2 dataSource="QueueStatsV2=pksolocp01|queueName%21%3Desb%2Fq%2Fkfl%2Fsib%2Fdata%2Fin*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded"
ts=2023-12-20T09:16:40.176Z caller=solace_prometheus_exporter.go:110 level=info msg="Register handler from config" handler=/solace-esb-sempv1 dataSource="QueueStatsV1=pksolocp01|*|"
ts=2023-12-20T09:16:40.177Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T09:16:40.491Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=314.225659ms
ts=2023-12-20T09:16:40.492Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T09:16:40.492Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T09:17:10.497Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T09:17:10.497Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T09:17:10.943Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=445.909028ms
ts=2023-12-20T09:17:10.943Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T09:17:40.498Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T09:17:40.943Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T09:17:41.008Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=64.479674ms
ts=2023-12-20T09:17:41.008Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2
ts=2023-12-20T09:18:10.526Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T09:18:11.009Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-20T09:18:11.292Z caller=http.go:64 level=debug msg="Scraped QueueStatsSemp2" page=1 duration=283.099792ms
ts=2023-12-20T09:18:11.292Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=2

I already created a ticket (just now actually). I think I have your mail aswell from the communication with the Swiss Airport colleague. If so, I will add you in CC to the ticket.

GreenRover commented 9 months ago

Thanks to this attempt but the log is not changing.

What i would expect to see is, either:

ts=2023-12-20T09:16:40.492Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T09:18:11.292Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=1 duration=XXXXms
ts=2023-12-20T09:17:41.008Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=1

OR

ts=2023-12-20T09:16:40.492Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-20T09:58:49.567Z caller=getQueueStatsSemp1.go:63 level=error msg="Can't scrape QueueStatsSemp1" err="Post \"https://mr-connection-j6em7yuyi7k.messaging.solace.cloud:943/SEMP\": context deadline exceeded (Client.Timeou
t exceeded while awaiting headers)" broker=https://mr-connection-j6em7yuyi7k.messaging.solace.cloud:943

...........

I found the issue: Please change in your configuration:

[endpoint.solace-v1-test]
QueueStatsV1=pksolocp01|*|

too

[endpoint.solace-v1-test]
QueueStats = pksolocp01|*

I improved the error logging to detect similar issues easier. Can you please double check if there are any queues on the broker matching queueName!=esb/q/kfl/sib/data/in*

PS: better not use url coding in configuration to improve readability

GreenRover commented 9 months ago

@JanGrosse I not jet received any ticket information from solace. I guess you have to forward response/communication with solace manually.

JanGrosse commented 9 months ago

@GreenRover, sorry I was yesterday distracted with something else and did not request to add you yet. But I updated the ticket now so once the Support is again working on it, you should see the messages.

I updated the image this morning and removed the URL encoding & added the correct sempv1 endpoint. Now it worked. I removed some informations from the request (mainly the names of the queues it was listing) But the important parts of informations should still be visibile (Request time, results per page etc.) If you are missing some information I can check again if I removed it from the logs.

Regarding the CPU mentioning. The broker itself has a upper limit of 8 CPU but usually never reaches those (it stays between 3-4 CPU used) - so I am not sure if adding more CPU will change anything.

Here are the amount of CPU used over the last week (barely scratching the 4 CPU) image

ts=2023-12-21T08:24:57.454Z caller=solace_prometheus_exporter.go:82 level=info msg="Starting solace_prometheus_exporter" version=1.004005e+06
ts=2023-12-21T08:24:57.454Z caller=solace_prometheus_exporter.go:83 level=info msg="Build context" context="(go=go1.20.12, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2023-12-21T08:24:57.454Z caller=solace_prometheus_exporter.go:95 level=info msg=Scraping listenAddr=http://0.0.0.0:9628 scrapeURI=https://pksolocp01.esb-messaging.svc.cluster.local:1943 username=esb_monitoring_user sslVerify=false timeout=1m0s
ts=2023-12-21T08:24:57.454Z caller=solace_prometheus_exporter.go:110 level=info msg="Register handler from config" handler=/solace-esb-sempv2 dataSource="QueueStatsV2=pksolocp01|queueName!=esb/q/kfl/sib/data/in*|solace_queue_msg_shutdown_discarded,solace_queue_msg_max_redelivered_discarded"
ts=2023-12-21T08:24:57.454Z caller=solace_prometheus_exporter.go:110 level=info msg="Register handler from config" handler=/solace-esb-sempv1 dataSource="QueueStats=pksolocp01|*|"
ts=2023-12-21T08:24:57.454Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-21T08:25:10.610Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=1 duration=13.155573288s
ts=2023-12-21T08:25:10.611Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=1
ts=2023-12-21T08:25:13.538Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=2 duration=2.926287593s
ts=2023-12-21T08:25:13.539Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=2
ts=2023-12-21T08:25:17.770Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=3 duration=4.230165612s
ts=2023-12-21T08:25:17.772Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=3
ts=2023-12-21T08:25:23.386Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=4 duration=5.613309152s
ts=2023-12-21T08:25:23.386Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=4
ts=2023-12-21T08:25:27.291Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=5 duration=3.903947672s
ts=2023-12-21T08:25:27.292Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=5
ts=2023-12-21T08:25:32.371Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=6 duration=5.078736622s
ts=2023-12-21T08:25:32.372Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=6
ts=2023-12-21T08:25:35.988Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=7 duration=3.614812789s
ts=2023-12-21T08:25:35.989Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=7
ts=2023-12-21T08:25:43.186Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=8 duration=7.195381034s
ts=2023-12-21T08:25:43.186Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=8
ts=2023-12-21T08:25:57.484Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=9 duration=14.296781903s
ts=2023-12-21T08:25:57.484Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=9
ts=2023-12-21T08:26:11.944Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=10 duration=14.458515737s
ts=2023-12-21T08:26:11.944Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=10
ts=2023-12-21T08:26:20.258Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=11 duration=8.313195384s
ts=2023-12-21T08:26:20.260Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=11
ts=2023-12-21T08:26:22.739Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=12 duration=2.478325103s
ts=2023-12-21T08:26:22.739Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=27 page=12
ts=2023-12-21T08:26:23.337Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=13
ts=2023-12-21T08:26:24.133Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=14
ts=2023-12-21T08:26:24.582Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=15
ts=2023-12-21T08:26:25.443Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=16
ts=2023-12-21T08:26:25.950Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=17
ts=2023-12-21T08:26:26.586Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=18
ts=2023-12-21T08:26:27.327Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=19
ts=2023-12-21T08:26:27.691Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=20
ts=2023-12-21T08:26:28.689Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=21
ts=2023-12-21T08:26:29.759Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=22 duration=1.069553918s
ts=2023-12-21T08:26:29.759Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=22
ts=2023-12-21T08:26:30.339Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=23
ts=2023-12-21T08:26:30.820Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=24
ts=2023-12-21T08:26:31.209Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=25
ts=2023-12-21T08:26:31.568Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=26
ts=2023-12-21T08:26:31.787Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=27
ts=2023-12-21T08:26:32.252Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=28
ts=2023-12-21T08:26:33.026Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=29
ts=2023-12-21T08:26:33.887Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=30
ts=2023-12-21T08:26:34.129Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=31
ts=2023-12-21T08:26:34.978Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=32
ts=2023-12-21T08:26:35.741Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=33
ts=2023-12-21T08:26:36.242Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=34
ts=2023-12-21T08:26:37.109Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=35
ts=2023-12-21T08:26:37.664Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=36
ts=2023-12-21T08:26:38.203Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=37
ts=2023-12-21T08:26:39.277Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=38 duration=1.074527322s
ts=2023-12-21T08:26:39.277Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=38
ts=2023-12-21T08:26:40.209Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=39
ts=2023-12-21T08:26:41.282Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=40 duration=1.072891181s
ts=2023-12-21T08:26:41.282Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=40
ts=2023-12-21T08:26:41.560Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=0 page=41
ts=2023-12-21T08:26:45.914Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=42 duration=4.353657283s
ts=2023-12-21T08:26:45.914Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=43 page=42
ts=2023-12-21T08:26:52.376Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=43 duration=6.461532082s
ts=2023-12-21T08:26:52.378Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=43
ts=2023-12-21T08:26:57.738Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=44 duration=5.359451773s
ts=2023-12-21T08:26:57.738Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=44
ts=2023-12-21T08:27:05.945Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=45 duration=8.206335269s
ts=2023-12-21T08:27:05.946Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=65 page=45
ts=2023-12-21T08:27:05.946Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv1
ts=2023-12-21T08:27:06.494Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=1 duration=547.615811ms request=<rpc><show><queue><name>*</name><vpn-name>pksolocp01</vpn-name><stats/><count/><num-elements>100</num-elements></queue></show></rpc>
ts=2023-12-21T08:27:06.502Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=16 page=1
ts=2023-12-21T08:27:06.982Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=2 duration=479.804875ms 
ts=2023-12-21T08:27:06.991Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=9 page=2
ts=2023-12-21T08:27:07.070Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=3 duration=78.648138ms 
ts=2023-12-21T08:27:07.077Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=9 page=3
ts=2023-12-21T08:27:08.025Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=4 duration=948.161506ms 
ts=2023-12-21T08:27:08.033Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=8 page=4
ts=2023-12-21T08:27:08.420Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=5 duration=386.306146ms 
ts=2023-12-21T08:27:08.433Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=48 page=5
ts=2023-12-21T08:27:09.339Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=6 duration=904.938143ms 
ts=2023-12-21T08:27:09.353Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=38 page=6
ts=2023-12-21T08:27:09.535Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=7 duration=181.483888ms 
ts=2023-12-21T08:27:09.545Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=27 page=7
ts=2023-12-21T08:27:10.247Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=8 duration=701.189967ms 
ts=2023-12-21T08:27:10.260Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=8
ts=2023-12-21T08:27:10.912Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=9 duration=651.658112ms 
ts=2023-12-21T08:27:10.923Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=9
ts=2023-12-21T08:27:12.548Z caller=http.go:34 level=warn msg="Scraped QueueStatsSemp1 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=10 duration=1.624739769s
ts=2023-12-21T08:27:12.549Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=10 duration=1.624739769s 
ts=2023-12-21T08:27:12.560Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=28 page=10
ts=2023-12-21T08:27:12.764Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=11 duration=203.3959ms 
ts=2023-12-21T08:27:12.776Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=38 page=11
ts=2023-12-21T08:27:12.905Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=12 duration=128.759103ms 
ts=2023-12-21T08:27:12.918Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=40 page=12
ts=2023-12-21T08:27:13.843Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=13 duration=924.323408ms 
ts=2023-12-21T08:27:13.855Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=33 page=13
ts=2023-12-21T08:27:13.960Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=14 duration=104.189794ms 
ts=2023-12-21T08:27:13.972Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=14
ts=2023-12-21T08:27:14.048Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=15 duration=76.346148ms 
ts=2023-12-21T08:27:14.058Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=23 page=15
ts=2023-12-21T08:27:14.713Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=16 duration=654.723705ms 
ts=2023-12-21T08:27:14.724Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=23 page=16
ts=2023-12-21T08:27:15.093Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=17 duration=368.391182ms 
ts=2023-12-21T08:27:15.103Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=27 page=17
ts=2023-12-21T08:27:15.399Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=18 duration=295.59184ms 
ts=2023-12-21T08:27:15.411Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=42 page=18
ts=2023-12-21T08:27:16.217Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=19 duration=804.797958ms 
ts=2023-12-21T08:27:16.229Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=19
ts=2023-12-21T08:27:16.375Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=20 duration=145.448528ms 
ts=2023-12-21T08:27:16.388Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=34 page=20
ts=2023-12-21T08:27:16.559Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=21 duration=170.509703ms 
ts=2023-12-21T08:27:16.570Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=29 page=21
ts=2023-12-21T08:27:17.356Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=22 duration=785.590411ms 
ts=2023-12-21T08:27:17.369Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=24 page=22
ts=2023-12-21T08:27:17.527Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=23 duration=157.472544ms 
ts=2023-12-21T08:27:17.538Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=31 page=23
ts=2023-12-21T08:27:17.638Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=24 duration=99.981792ms 
ts=2023-12-21T08:27:17.653Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=24
ts=2023-12-21T08:27:18.244Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=25 duration=590.235435ms 
ts=2023-12-21T08:27:18.255Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=25
ts=2023-12-21T08:27:18.427Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=26 duration=171.676147ms 
ts=2023-12-21T08:27:18.438Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=29 page=26
ts=2023-12-21T08:27:18.478Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=27 duration=39.480928ms 
ts=2023-12-21T08:27:18.491Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=27
ts=2023-12-21T08:27:18.617Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=28 duration=124.960498ms 
ts=2023-12-21T08:27:18.629Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=31 page=28
ts=2023-12-21T08:27:18.782Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=29 duration=153.194717ms 
ts=2023-12-21T08:27:18.794Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=25 page=29
ts=2023-12-21T08:27:19.345Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=30 duration=550.101777ms 
ts=2023-12-21T08:27:19.355Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=30 page=30
ts=2023-12-21T08:27:19.421Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=31 duration=65.357014ms 
ts=2023-12-21T08:27:19.429Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=19 page=31
ts=2023-12-21T08:27:19.604Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=32 duration=173.872539ms 
ts=2023-12-21T08:27:19.614Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=30 page=32
ts=2023-12-21T08:27:20.185Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=33 duration=570.323612ms 
ts=2023-12-21T08:27:20.194Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=23 page=33
ts=2023-12-21T08:27:20.558Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=34 duration=363.489861ms 
ts=2023-12-21T08:27:20.568Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=27 page=34
ts=2023-12-21T08:27:21.284Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=35 duration=716.016859ms 
ts=2023-12-21T08:27:21.296Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=33 page=35
ts=2023-12-21T08:27:21.606Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=36 duration=309.192073ms 
ts=2023-12-21T08:27:21.626Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=35 page=36
ts=2023-12-21T08:27:21.823Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=37 duration=195.723232ms 
ts=2023-12-21T08:27:21.837Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=43 page=37
ts=2023-12-21T08:27:22.073Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=38 duration=235.321521ms 
ts=2023-12-21T08:27:22.083Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=28 page=38
ts=2023-12-21T08:27:22.426Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=39 duration=342.139873ms 
ts=2023-12-21T08:27:22.435Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=23 page=39
ts=2023-12-21T08:27:22.750Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=40 duration=314.953558ms 
ts=2023-12-21T08:27:22.768Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=80 page=40
ts=2023-12-21T08:27:23.054Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=41 duration=284.823403ms 
ts=2023-12-21T08:27:23.077Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=41
ts=2023-12-21T08:27:23.791Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=42 duration=668.390465ms 
ts=2023-12-21T08:27:23.812Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=42
ts=2023-12-21T08:27:24.114Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=43 duration=300.796595ms 
ts=2023-12-21T08:27:24.138Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=96 page=43
ts=2023-12-21T08:27:24.677Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=44 duration=536.409001ms 
ts=2023-12-21T08:27:24.698Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=97 page=44
ts=2023-12-21T08:27:24.927Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=45 duration=226.741852ms 
ts=2023-12-21T08:27:25.035Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=45
ts=2023-12-21T08:27:25.156Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=46 duration=118.435422ms 
ts=2023-12-21T08:27:25.177Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=46
ts=2023-12-21T08:27:25.428Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=47 duration=250.185229ms 
ts=2023-12-21T08:27:25.449Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=47
ts=2023-12-21T08:27:25.652Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=48 duration=201.545062ms 
ts=2023-12-21T08:27:25.672Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=48
ts=2023-12-21T08:27:25.808Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=49 duration=134.067283ms 
ts=2023-12-21T08:27:25.832Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=49
ts=2023-12-21T08:27:26.197Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=50 duration=275.21968ms 
ts=2023-12-21T08:27:26.217Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=50
ts=2023-12-21T08:27:26.563Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=51 duration=344.40441ms 
ts=2023-12-21T08:27:26.585Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=51
ts=2023-12-21T08:27:27.054Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=52 duration=468.419133ms 
ts=2023-12-21T08:27:27.075Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=52
ts=2023-12-21T08:27:27.723Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=53 duration=646.672479ms 
ts=2023-12-21T08:27:27.743Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=53
ts=2023-12-21T08:27:28.141Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=54 duration=397.075894ms 
ts=2023-12-21T08:27:28.162Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=97 page=54
ts=2023-12-21T08:27:28.690Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=55 duration=525.196915ms 
ts=2023-12-21T08:27:28.735Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=55
ts=2023-12-21T08:27:28.916Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=56 duration=178.318116ms 
ts=2023-12-21T08:27:28.937Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=56
ts=2023-12-21T08:27:29.306Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=57 duration=367.323915ms 
ts=2023-12-21T08:27:29.326Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=57
ts=2023-12-21T08:27:30.184Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=58 duration=856.088596ms 
ts=2023-12-21T08:27:30.206Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=58
ts=2023-12-21T08:27:30.953Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=59 duration=745.136462ms 
ts=2023-12-21T08:27:30.974Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=100 page=59
ts=2023-12-21T08:27:31.535Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=60 duration=559.458099ms 
ts=2023-12-21T08:27:31.556Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=98 page=60
ts=2023-12-21T08:27:32.338Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=61 duration=780.383842ms 
ts=2023-12-21T08:27:32.360Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=61
ts=2023-12-21T08:27:32.820Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=62 duration=297.84318ms 
ts=2023-12-21T08:27:32.844Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=99 page=62
ts=2023-12-21T08:27:33.190Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=63 duration=345.100334ms 
ts=2023-12-21T08:27:33.212Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=98 page=63
ts=2023-12-21T08:27:34.048Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=64 duration=834.918582ms 
ts=2023-12-21T08:27:34.070Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=98 page=64
ts=2023-12-21T08:27:34.773Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=65 duration=700.886348ms 
ts=2023-12-21T08:27:34.799Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=97 page=65
ts=2023-12-21T08:27:35.416Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=66 duration=614.552133ms 
ts=2023-12-21T08:27:35.432Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=70 page=66
ts=2023-12-21T08:27:36.174Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=67 duration=741.41779ms 
ts=2023-12-21T08:27:36.189Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=67 page=67
ts=2023-12-21T08:27:36.662Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=68 duration=471.661356ms 
ts=2023-12-21T08:27:36.678Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=66 page=68
ts=2023-12-21T08:27:37.672Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=69 duration=992.764774ms 
ts=2023-12-21T08:27:37.688Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=65 page=69
ts=2023-12-21T08:27:38.121Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=70 duration=431.532628ms 
ts=2023-12-21T08:27:38.137Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=65 page=70
ts=2023-12-21T08:27:38.548Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=71 duration=409.867695ms 
ts=2023-12-21T08:27:38.728Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=66 page=71
ts=2023-12-21T08:27:39.178Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=72 duration=448.657662ms 
ts=2023-12-21T08:27:39.194Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=66 page=72
ts=2023-12-21T08:27:39.415Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=73 duration=219.959528ms 
ts=2023-12-21T08:27:39.428Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=50 page=73
ts=2023-12-21T08:27:39.623Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=74 duration=194.517297ms 
ts=2023-12-21T08:27:39.629Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=1 page=74
ts=2023-12-21T08:27:40.276Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=75 duration=647.181018ms 
ts=2023-12-21T08:27:40.282Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=1 page=75
ts=2023-12-21T08:27:40.772Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=76 duration=489.939074ms 
ts=2023-12-21T08:27:40.779Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=2 page=76
ts=2023-12-21T08:27:40.866Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=77 duration=86.376198ms 
ts=2023-12-21T08:27:40.871Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=1 page=77
ts=2023-12-21T08:27:41.912Z caller=http.go:34 level=warn msg="Scraped QueueStatsSemp1 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=78 duration=1.040854744s
ts=2023-12-21T08:27:41.912Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=78 duration=1.040854744s 
ts=2023-12-21T08:27:41.918Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=2 page=78
ts=2023-12-21T08:27:42.349Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=79 duration=431.099759ms 
ts=2023-12-21T08:27:42.357Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=18 page=79
ts=2023-12-21T08:27:42.692Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=80 duration=334.653781ms 
ts=2023-12-21T08:27:42.710Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=87 page=80
ts=2023-12-21T08:27:43.023Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=81 duration=312.097407ms 
ts=2023-12-21T08:27:43.039Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=82 page=81
ts=2023-12-21T08:27:43.985Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=82 duration=944.318674ms 
ts=2023-12-21T08:27:44.000Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=73 page=82
ts=2023-12-21T08:27:44.894Z caller=http.go:36 level=debug msg="Scraped QueueStatsSemp1" page=83 duration=892.704635ms 
ts=2023-12-21T08:27:44.902Z caller=getQueueStatsSemp1.go:79 level=debug msg="Result of QueueStatsSemp1" results=26 page=83
ts=2023-12-21T08:27:44.902Z caller=asyncFetcher.go:36 level=debug msg="Fetching for handler" handler=/solace-esb-sempv2
ts=2023-12-21T08:27:57.800Z caller=http.go:62 level=warn msg="Scraped QueueStatsSemp2 but this took very long. Please add more cpu to your broker. Otherwise you are about to harm your broker." page=1 duration=12.897938738s
ts=2023-12-21T08:27:57.801Z caller=getQueueStatsSemp2.go:98 level=debug msg="Result of QueueStatsSemp2" results=100 page=1
GreenRover commented 9 months ago

Ok very strange results.

If i would only scrape pages with SempV2 that have no results it would take 109.2sec whats is slower than a full scan with semp v1

If you would do this:

[solace]
timeout=5s
# 0s means disabled. When set an interval, all well configured endpoints will fetched async.
# This may help you to deal with slower broker or extreme amount of results.
prefetchInterval=15s

# Because a solace advices to have less than 10 semp requests per second . This is a very rare resources that needs to be take care of.
parallelSempConnections=1

[endpoint.solace-v1-test]
QueueStatsV1=pksolocp01|*|

And scrape with Prometheus every 60sec. Than we would not longer have timeouts in Prometheus, but a little performance improvements because providing this much metrics as Prometheus endpoint is expensive.

Would this be a solution for you?

JanGrosse commented 9 months ago

Just for my understanding - the only difference between how the exporter is doing it now and how he would do it in the upcoming version would be that he prefetches the result in a asynchronous fashion and on scrape requests from prometheus basically provides the "cached" data ?

You also mentioned

a little performance improvements because providing this much metrics as Prometheus endpoint is expensive

could you elobarte on what you mean by that ? Providing this much metrics as endpoint is expensive for the exporter itself ?

GreenRover commented 9 months ago

Yes that is correct.

Just try to open the metric page. What only will provide metrics from cache. But this might take up to 2 seconds. But when i take the logs you provided the target to be below 1 minute in scraping time, is possible.

I can try to skip uninteresting pages for sempV1 but if i check the statistics sempv2 is so slow that it is completely out of the race.

Sorry but without do all the Prometheus magic by my own i can not help further.

JanGrosse commented 9 months ago

Understood, thank you very much for your assistance. I agree that the asynchronous fetching with the sempv1 requests have shown that a below 1 min scrape is possible.

From our point we would be happy if this feature gets "officially" released in a upcoming version of the solace-prometheus-exporter, I will discuss the results internally with my colleagues as well (once they are all back from vacation) and move further discussions about improved scrape times into the Solace ticket.

Thank you very much (again) and happy holidays.

GreenRover commented 8 months ago

Lets say it this way: solace managed to respond without giving any answers. I will try to dig deeper here using a new solace ticket.

I the current solution helps you, i will do:

I guess if the current SempV1 implementation brings no improvement. You should consider a renaming your queues to be able to query all using an reasonable amount of SempV1 querys.

GreenRover commented 8 months ago

With #65 i added a new behavior that may helps here. The background scraping will update the cached metrics every 100 metrics to provide updated metrics as early as possible when fetching metrics.

GreenRover commented 8 months ago

Can you please give the latest build a try and see if it behaves correctly within a long term test?

GreenRover commented 7 months ago

@andreas-habel i close this issue caused by in activity on your side.