stackabletech / issues

This repository is only for issues that concern multiple repositories or don't fit into any specific repository
2 stars 0 forks source link

Fix invalid log events #491

Closed siegfriedweber closed 5 months ago

siegfriedweber commented 10 months ago

The XML log files produced by log4j 1.x, log4j 2.x and logback are invalid in rare cases. The XML snippets cannot be parsed by Vector and the fields like logger and level cannot be set in the resulting log event.

This is usually the root cause if the logging tests fail with the following error:

running command: [sh -c kubectl exec --namespace=$NAMESPACE test-runner-0 -- python /tmp/test_log_aggregation.py -n $NAMESPACE]
Traceback (most recent call last):
  File "/tmp/test_log_aggregation.py", line 47, in <module>
    check_sent_events()
  File "/tmp/test_log_aggregation.py", line 38, in check_sent_events
    sentEvents['sentEventsTotal'] == 0, \
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Invalid log events were sent.
command terminated with exit code 1

Examples of invalid XML

Log events pushed into each other

It happens relatively often that two log events are pushed into each other.

<log4j:event logger="LearnerHandler-/10.244.1.14:57252">
  <log4j:message>org.apache.zookeeper.server.quorum.QuorumCnxManager"
             timestamp="1671098962896" level="
</<log4j:event logger="LearnerHandler-/10.244.1.14:57252">
  <log4j:message>org.apache.zookeeper.server.quorum.QuorumCnxManager"
             timestamp="1671098962896" level="
</" thread="RecvWorker:1">
  <log4j:message>Interrupting SendWorker thread from RecvWorker. sid: 1. myId: 3</log4j:message>

</log4j:event>

Unexpected line endings

Log4j 1.x and 2.x use the line delimiter \n within a log event and \r\n between the log events:

<log4j:event ...>\n
  <log4j:message>...</log4j:message>\n
</log4j:event>\r\n
<log4j:event ...>\n
  <log4j:message>...</log4j:message>\n
</log4j:event>\r\n

The line delimiter \r\n is used by Vector to extract the single log events. This is actually more reliable in Vector than trying to find the closing tag </log4j:event>.

In really rare cases, the line delimiter seems to be \n or \r between the log4j events. However, this is not confirmed yet.

Action

There are the following options to close this issue:

Tasks

soenkeliebau commented 10 months ago

If I understand you correctly this is potentially an upstream issue?

If Vector produces an extra field to mark this, this would stabilize our tests while retaining the ability to see how often it occured, as we can then filter for it in OpenSearch, correct?

If correct that would sound to me like an okay compromise..

siegfriedweber commented 10 months ago

If I understand you correctly this is potentially an upstream issue?

Yes, it seems. Surprisingly all the logging frameworks are affected and I have not found according bug reports. So I would be careful about blaming upstream for this. There could also be incompatible JARs in our classpath.

If Vector produces an extra field to mark this, this would stabilize our tests while retaining the ability to see how often it occured, as we can then filter for it in OpenSearch, correct?

Correct.