ptarmiganlabs / butler-sos

Butler SenseOps Stats ("Butler-SOS") is a microservice publishing operational Qlik Sense metrics to InfluxDB, Prometheus and New Relic. Add Grafana for great looking dashboards and you get real-time monitoring of what happens inside a Qlik Sense environment.
https://butler-sos.ptarmiganlabs.com
MIT License
27 stars 13 forks source link

Log event UDP message from QS server causes error when written to InfluxDB #805

Closed mountaindude closed 5 months ago

mountaindude commented 5 months ago

What version of Butler SOS are you using?

10.0.0 alfa

What version of Node.js are you using? Not applicable if you use the standalone version of Butler SOS.

20

What command did you use to start Butler SOS?

-

What operating system are you using?

Linux

What CPU architecture are you using?

x86_64

What Qlik Sense versions are you using?

2024-May IR

Describe the Bug

Error

The Log4Net log appenders deployed in Sense server picks up and forwards a log message to Butler SOS that causes errors when written to InfluxDB.

This has not been observed before, so it could be something that is related to a recent upgrade to 2024-May IR. But that is not confirmed. Doesn't really matter, the parsing of incoming UDP messages could use a bit more robustness so they also handle new kinds of messages created by QS Server.

After Butler SOS has parsed the incoming log UDP message from QS server we get this:

[
   {
      "fields": {
         "message": "Unanticipated exception",
         "raw_event": "{\"source\":\"qseow-engine\",\"log_row\":\"200\",\"ts_iso\":\"20240519T115549.898+0200\",\"ts_local\":\"2024-05-19 11:55:49,898\",\"level\":\"WARN\",\"host\":\"pro2-win1\",\"subsystem\":\"System.Repository.Qlik.Sense.Common.Communication.Server.HttpConnectionQueueHandler\",\"windows_user\":\"LAB\\\\qlikservice\",\"message\":\"Unanticipated exception\",\"proxy_session_id\":\"\",\"user_directory\":\"\",\"user_id\":\"\",\"engine_ts\":\"\",\"process_id\":\"\",\"engine_exe_version\":\"\",\"server_started\":\"\",\"entry_type\":\"\",\"session_id\":\"\",\"app_id\":\"System.Net.HttpListenerException (0x80004005): An operation was attempted on a nonexistent network connection\\r\\n   at System.Net.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 size)\\r\\n   at Qlik.Sense.Common.Communication.Server.StreamRecorder.Write(Byte[] buffer, Int32 offset, Int32 count)\\r\\n   at System.IO.Compression.DeflateStream.DoMaintenance(Byte[] array, Int32 offset, Int32 count)\\r\\n   at System.IO.Compression.DeflateStream.InternalWrite(Byte[] array, Int32 offset, Int32 count, Boolean isAsync)\\r\\n   at System.IO.Compression.DeflateStream.Write(Byte[] array, Int32 offset, Int32 count)\\r\\n   at Qlik.Sense.Common.Communication.Server.HttpConnectionHandler.WriteResponse(IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording, String response)\\r\\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.<>c__DisplayClass43_0`1.<HandleException>b__0(String response, HttpStatusCode statusCode, String contentType)\\r\\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleUnknownException(Exception exception, Func`1 cultureInfoProvider, Boolean restMethodCalled, Exception& ex, ResponseHandler responseHandler)\\r\\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleException[T](Exception& ex, Boolean restMethodCalled, IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording, ExceptionHandler`1 exceptionHandler)\\r\\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleException(Exception ex, Boolean restMethodCalled, IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording)\\r\\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleRequest(IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording)\\r\\n   at Qlik.Sense.Common.Communication.Server.HttpConnectionQueueHandler.InnerHandleConnection(IHttpListenerRequestContext request, IRequestHandler requestHandler, IHttpListenerRequestContext duplicatedRequest)\\r\\n\",\"user_full\":\"\",\"tags\":{}}",
         "session_id": ""
      },
      "measurement": "log_event",
      "tags": {
         "app_id": "System.Net.HttpListenerException (0x80004005): An operation was attempted on a nonexistent network connection\r\n   at System.Net.HttpResponseStream.Write(Byte[] buffer, Int32 offset, Int32 size)\r\n   at Qlik.Sense.Common.Communication.Server.StreamRecorder.Write(Byte[] buffer, Int32 offset, Int32 count)\r\n   at System.IO.Compression.DeflateStream.DoMaintenance(Byte[] array, Int32 offset, Int32 count)\r\n   at System.IO.Compression.DeflateStream.InternalWrite(Byte[] array, Int32 offset, Int32 count, Boolean isAsync)\r\n   at System.IO.Compression.DeflateStream.Write(Byte[] array, Int32 offset, Int32 count)\r\n   at Qlik.Sense.Common.Communication.Server.HttpConnectionHandler.WriteResponse(IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording, String response)\r\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.<>c__DisplayClass43_0`1.<HandleException>b__0(String response, HttpStatusCode statusCode, String contentType)\r\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleUnknownException(Exception exception, Func`1 cultureInfoProvider, Boolean restMethodCalled, Exception& ex, ResponseHandler responseHandler)\r\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleException[T](Exception& ex, Boolean restMethodCalled, IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording, ExceptionHandler`1 exceptionHandler)\r\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleException(Exception ex, Boolean restMethodCalled, IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording)\r\n   at Qlik.Sense.Common.Communication.REST.Server.RESTEngine.HandleRequest(IHttpListenerRequestWrapper httpRequest, IHttpListenerResponseWrapper httpResponse, IHttpRecording httpRecording)\r\n   at Qlik.Sense.Common.Communication.Server.HttpConnectionQueueHandler.InnerHandleConnection(IHttpListenerRequestContext request, IRequestHandler requestHandler, IHttpListenerRequestContext duplicatedRequest)\r\n",
         "host": "pro2-win1",
         "level": "WARN",
         "log_row": "200",
         "source": "qseow-engine",
         "subsystem": "System.Repository.Qlik.Sense.Common.Communication.Server.HttpConnectionQueueHandler",
         "windows_user": "LAB\\qlikservice"
      }
   }
]

Results in Butler SOS error when trying to write the datapoint to InfluxDB:

error: LOG EVENT INFLUXDB 1: Error saving log event to InfluxDB! Error: A 400 Bad Request error occurred: {"error":"unable to parse '

Possible causes

Expected Behavior

No response

To Reproduce

No response

mountaindude commented 5 months ago

Looks like the sample XML Log4net appender file for Engine service is slightly incorrect.

Current file looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <!-- Generic appender for detecting warnings and errors -->
    <appender name="LogEvent" type="log4net.Appender.UdpAppender">
        <param name="threshold" value="warn" />
        <param name="remoteAddress" value="FQDN or IP of server where Butler SOS is running" />
        <param name="remotePort" value="9996" />
        <param name="encoding" value="utf-8" />
        <layout type="log4net.Layout.PatternLayout">
            <converter>
                <param name="name" value="rownum" />
                <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.CounterPatternConverter" /> 
            </converter> 
            <converter>
                <param name="name" value="hostname" />
                <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.HostNamePatternConverter" />
            </converter>
            <converter>
                  <param name="name" value="longIso8601date" /> 
                  <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.Iso8601TimeOffsetPatternConverter" /> 
            </converter>
            <converter> 
                  <param name="name" value="user" /> 
                  <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.ServiceUserNameCachedPatternConverter" /> 
            </converter> 
            <converter> 
                  <param name="name" value="encodedmessage" /> 
                  <param name="type" value="Qlik.Sense.Logging.log4net.Layout.Pattern.EncodedMessagePatternConverter" /> 
            </converter> 
            <param name="conversionpattern" value="/qseow-engine/;%rownum{9999};%longIso8601date;%date;%level;%hostname;%logger;%user;%encodedmessage;%property{ProxySessionId};%property{ActiveUserDirectory};%property{ActiveUserId};%property{EngineTimestamp};%property{ProcessId};%property{Exe Version};%property{Server Started};%property{Entry Type};%property{SessionId};%property{AppId}" />
        </layout>
    </appender>

    <!-- Send UDP message to Butler SOS on warnings and errors -->
    <logger name="Performance">
        <appender-ref ref="LogEvent" />
    </logger>
    <logger name="System">
        <appender-ref ref="LogEvent" />
    </logger>
    <logger name="Session">
        <appender-ref ref="LogEvent" />
    </logger>
</configuration>

Those last lines (for example <logger name="System"> will capture all log events emitted from the System service. In this context we're however only interested in System.Engine messages.

Changing this will reduce load on Butler SOS somewhat (fewer - but more relevant - messages will be sent from QS server).

Also add code in Butler SOS verifying that fields that are supposed to be UUIDs are in fact that, fields that are supposed to be numeric only are in fact that etc, when creating the InfluxDB datapoints.