ptarmiganlabs / butler

Butler brings superpowers to Qlik Sense Enterprise on Windows! Advanced reload failure alerts, task scheduler, key-value store, file system access and much more.
https://butler.ptarmiganlabs.com
MIT License
55 stars 8 forks source link

Custom services monitoring #850

Closed bintocher closed 9 months ago

bintocher commented 11 months ago

What version of Butler SOS are you using?

9.6.1

What version of Node.js are you using? Not applicable if you use the standalone version of Butler SOS.

No response

What command did you use to start Butler SOS?

docke-compose

What operating system are you using?

ubuntu 20

What CPU architecture are you using?

x86_64

What Qlik Sense versions are you using?

2023 August SR3

Describe the Bug

it is necessary to specify services for monitoring for each server

i have a 5-nodes qlik sense cluster, image

1 node - for etl (central node, repository, scheduler, printing, proxy, engine) image

1 node - for dev (repository, scheduler, printing, engine) image

2 nodes - for prod (repository, engine, printing) image

1 node - for proxy (repository, proxy) image

in grafana i get error every minute from proxy-node:

{"source":"qseow-repository", "log_row":"1196", "ts_iso":"20231023T093326.789+0300", "ts_local":"2023-10-23 09:33:26, 789", "level":"ERROR", "host":"proxy-qs.domain.com", "subsystem":"System.Repository.Repository.Communication.Clients.EngineClient", "windows_user":"domain\user_srv", "message":"API call to Engine service was not successful: Failed to get app 'StaticByteSize'.", "exception_message":"System.ArgumentNullException: Value cannot be null.↵↓Parameter name: stream↵↓ at System.IO.StreamReader..ctor(Stream stream, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize, Boolean leaveOpen)↵↓ at System.IO.StreamReader..ctor(Stream stream)↵↓ at Repository.Communication.Clients.EngineClient.GetAppStaticByteSize(App app)", "user_directory":"", "user_id":"", "command":"", "result_code":"", "origin":"", "context":"System.ArgumentNullException: Value cannot be null.\r\nParameter name: stream\r\n at System.IO.StreamReader..ctor(Stream stream, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize, Boolean leaveOpen)\r\n at System.IO.StreamReader..ctor(Stream stream)\r\n at Repository.Communication.Clients.EngineClient.GetAppStaticByteSize(App app)\r\n", "user_full":""}

Expected Behavior

No response

To Reproduce

No response

mountaindude commented 10 months ago

This one slipped through the cracks, sorry for slow response.

Btw, this ticket relates to the Windows service monitoring feature of Butler, not Butler SOS.
I will move the ticket to the Butler repository instead.

Seems to be two things covered in this ticket:

1. "it is necessary to specify services for monitoring for each server"

This is by design. Butler's service-monitoring feature is generic and can be used to monitor any Windows service, not only Qlik Sense services. In the config file you specify which servers/services to monitor.

For example, a configuration to monitor the QS services of a two-node Sense cluster could look like this:

  serviceMonitor:
    enable: true
    frequency: every 30 seconds         # https://bunkat.github.io/later/parsers.html
    monitor:
      - host: qs-server1
        services:
          - name: QlikSenseEngineService
            friendlyName:  QS Engine
          - name: QlikSensePrintingService
            friendlyName: QS printing
          - name: QlikSenseProxyService
            friendlyName: QS proxy
          - name: QlikSenseRepositoryService
            friendlyName: QS repository
          - name: QlikSenseSchedulerService
            friendlyName: QS scheduler
          - name: QlikSenseServiceDispatcher
            friendlyName: QS service dispatcher
      - host: qs-server2
        services:
          - name: QlikSenseEngineService
            friendlyName:  QS Engine
          - name: QlikSensePrintingService
            friendlyName: QS printing
          - name: QlikSenseProxyService
            friendlyName: QS proxy
          - name: QlikSenseRepositoryService
            friendlyName: QS repository
          - name: QlikSenseSchedulerService
            friendlyName: QS scheduler
          - name: QlikSenseServiceDispatcher
            friendlyName: QS service dispatcher
    alertDestination:
      influxDb:                     # Send service alerts to InfluxDB
        enable: true
      newRelic: 
        enable: false
      email:
        enable: true
      mqtt: 
        enable: true
      teams:
        enable: true
      slack:
        enable: true
      webhook:
        enable: false

Do note that the account running Butler itself must be a member of the local administrators group on all servers where services should be monitored! If the Butler account does not have needed permissions you will get error messages.

2. The log message you included

Just so I understand correctly, that error message shows up in the Grafana dashboard's section for error messages? And you have deployed the XML log appender files on the Sense servers? If so, it's an error message coming from Sense.

The effect of the XML files is that select warning and error messages (from engine, repository and proxy - if you deployed those XML files) will be forwarded to Butler SOS. The messages are then stored in InfluxDB (once again assuming you are using InfluxDB), from where they are visualised in the Grafana dashboard.

So, if all the above assumptions are correct I'd say something is not working correctly in your QS setup. That exact error message will also be present in the log files on disk on the proxy-qs.domain.com server - after all that's where Butler gets the log message from in the first place.

The idea behind Butler's warning and log forwarding feature is exactly this: Make QS log errors and warnings more visible and easier to detect - and then act on.

mountaindude commented 9 months ago

Latest version, 9.3.1, adds better support for Windows services. Among other things there is now better logging when failing to connect to remote Windows servers where services should be monitored.