Closed bintocher closed 9 months ago
This one slipped through the cracks, sorry for slow response.
Btw, this ticket relates to the Windows service monitoring feature of Butler, not Butler SOS.
I will move the ticket to the Butler repository instead.
Seems to be two things covered in this ticket:
This is by design. Butler's service-monitoring feature is generic and can be used to monitor any Windows service, not only Qlik Sense services. In the config file you specify which servers/services to monitor.
For example, a configuration to monitor the QS services of a two-node Sense cluster could look like this:
serviceMonitor:
enable: true
frequency: every 30 seconds # https://bunkat.github.io/later/parsers.html
monitor:
- host: qs-server1
services:
- name: QlikSenseEngineService
friendlyName: QS Engine
- name: QlikSensePrintingService
friendlyName: QS printing
- name: QlikSenseProxyService
friendlyName: QS proxy
- name: QlikSenseRepositoryService
friendlyName: QS repository
- name: QlikSenseSchedulerService
friendlyName: QS scheduler
- name: QlikSenseServiceDispatcher
friendlyName: QS service dispatcher
- host: qs-server2
services:
- name: QlikSenseEngineService
friendlyName: QS Engine
- name: QlikSensePrintingService
friendlyName: QS printing
- name: QlikSenseProxyService
friendlyName: QS proxy
- name: QlikSenseRepositoryService
friendlyName: QS repository
- name: QlikSenseSchedulerService
friendlyName: QS scheduler
- name: QlikSenseServiceDispatcher
friendlyName: QS service dispatcher
alertDestination:
influxDb: # Send service alerts to InfluxDB
enable: true
newRelic:
enable: false
email:
enable: true
mqtt:
enable: true
teams:
enable: true
slack:
enable: true
webhook:
enable: false
Do note that the account running Butler itself must be a member of the local administrators group on all servers where services should be monitored! If the Butler account does not have needed permissions you will get error messages.
Just so I understand correctly, that error message shows up in the Grafana dashboard's section for error messages? And you have deployed the XML log appender files on the Sense servers? If so, it's an error message coming from Sense.
The effect of the XML files is that select warning and error messages (from engine, repository and proxy - if you deployed those XML files) will be forwarded to Butler SOS. The messages are then stored in InfluxDB (once again assuming you are using InfluxDB), from where they are visualised in the Grafana dashboard.
So, if all the above assumptions are correct I'd say something is not working correctly in your QS setup.
That exact error message will also be present in the log files on disk on the proxy-qs.domain.com
server - after all that's where Butler gets the log message from in the first place.
The idea behind Butler's warning and log forwarding feature is exactly this: Make QS log errors and warnings more visible and easier to detect - and then act on.
Latest version, 9.3.1, adds better support for Windows services. Among other things there is now better logging when failing to connect to remote Windows servers where services should be monitored.
What version of Butler SOS are you using?
9.6.1
What version of Node.js are you using? Not applicable if you use the standalone version of Butler SOS.
No response
What command did you use to start Butler SOS?
docke-compose
What operating system are you using?
ubuntu 20
What CPU architecture are you using?
x86_64
What Qlik Sense versions are you using?
2023 August SR3
Describe the Bug
it is necessary to specify services for monitoring for each server
i have a 5-nodes qlik sense cluster,
1 node - for etl (central node, repository, scheduler, printing, proxy, engine)
1 node - for dev (repository, scheduler, printing, engine)
2 nodes - for prod (repository, engine, printing)
1 node - for proxy (repository, proxy)
in grafana i get error every minute from proxy-node:
{"source":"qseow-repository", "log_row":"1196", "ts_iso":"20231023T093326.789+0300", "ts_local":"2023-10-23 09:33:26, 789", "level":"ERROR", "host":"proxy-qs.domain.com", "subsystem":"System.Repository.Repository.Communication.Clients.EngineClient", "windows_user":"domain\user_srv", "message":"API call to Engine service was not successful: Failed to get app 'StaticByteSize'.", "exception_message":"System.ArgumentNullException: Value cannot be null.↵↓Parameter name: stream↵↓ at System.IO.StreamReader..ctor(Stream stream, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize, Boolean leaveOpen)↵↓ at System.IO.StreamReader..ctor(Stream stream)↵↓ at Repository.Communication.Clients.EngineClient.GetAppStaticByteSize(App app)", "user_directory":"", "user_id":"", "command":"", "result_code":"", "origin":"", "context":"System.ArgumentNullException: Value cannot be null.\r\nParameter name: stream\r\n at System.IO.StreamReader..ctor(Stream stream, Encoding encoding, Boolean detectEncodingFromByteOrderMarks, Int32 bufferSize, Boolean leaveOpen)\r\n at System.IO.StreamReader..ctor(Stream stream)\r\n at Repository.Communication.Clients.EngineClient.GetAppStaticByteSize(App app)\r\n", "user_full":""}
Expected Behavior
No response
To Reproduce
No response