Open nean-and-i opened 9 months ago
Pleas note this issue appears to be present in the source type: syslog
as well, where it even gets worse in terms of no workaround can be applied!
Below is a workaround by using a combination of source type: socket
and replace
function just before the parse_syslog
This ensures that the whole raw
unprocessed syslog message via socket
is able to reach a transform/remap section, so it can be fixed just before the parse_syslog
function is applied to it.
This is very similar and reminds me a lot on a bug in a junos version from Juniper, there Juniper created invalid rfc5424 log stream and falling back to raw messages in rsyslog did the trick to repair/workaround that bug.
In this case falling back to type socket
is exactly the same, as the raw
syslog message becomes available/fixable in message
field.
[transforms.log_fields]
inputs = [ "log_syslog" ]
type = "remap"
source= '''
.message = replace!(.message, r'(\[(?P<nonrfc5424>\w+\.\w+)\])', "$nonrfc5424")
. = parse_syslog!(.message)
'''
# remap drop safety
drop_on_abort = true
drop_on_error = true
reroute_dropped = true
This issue have a very interesting butterfly effect when using elasticsearch/opensearch sink, as following error can be puzzling -> java.lang.IllegalArgumentException: Limit of total fields [1000] has been exceeded
(NOTE: if someone is reading this because a result of a search for the java elasticsearch/opensearch error leads to this article, but not using vector -> please, debug the key
values being sent to elasticsearch/opensearch and ensure no dynamic or constantly changing key
aka field
values being used !)
java.lang.IllegalArgumentException: Limit of total fields [1000] has been exceeded
at org.opensearch.index.mapper.MappingLookup.checkFieldLimit(MappingLookup.java:193) ~[opensearch-2.11.0.jar:2.11.0]
at org.opensearch.index.mapper.MappingLookup.checkLimits(MappingLookup.java:185) ~[opensearch-2.11.0.jar:2.11.0]
at org.opensearch.index.mapper.DocumentMapper.validate(DocumentMapper.java:329) ~[opensearch-2.11.0.jar:2.11.0]
at org.opensearch.index.mapper.MapperService.internalMerge(MapperService.java:488) ~[opensearch-2.11.0.jar:2.11.0]
at org.opensearch.index.mapper.MapperService.internalMerge(MapperService.java:444) ~[opensearch-2.11.0.jar:2.11.0]
at org.opensearch.index.mapper.MapperService.merge(MapperService.java:416) ~[opensearch-2.11.0.jar:2.11.0]
at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:633) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:466) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.action.bulk.TransportShardBulkAction.performOnPrimary(TransportShardBulkAction.java:530) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:411) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.action.bulk.TransportShardBulkAction.dispatchedShardOperationOnPrimary(TransportShardBulkAction.java:123) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.action.support.replication.TransportWriteAction$1.doRun(TransportWriteAction.java:223) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.0.jar:2.11.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.0.jar:2.11.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
This happens because a lot of dynamic fields
with empty rfc5424 objects can be created and sent to elasticsearch/opensearch, especially when the syslog part within [
]
contains timestamp information or any changing value eg. [19347390.704884]
That message is not getting parsed as 5424. It is being treated as a 3164 message, which is much less rigidly defined.
What is the source of that message?
It does make sense to update that parser to be a bit more strict in determining what constitutes structured data. Something like [19347390.704884]
is quite clearly not going to be SD...
This log comes from a linux appliance looks like kernel ACPI messages, so it's actually not something out of ordinary.
<4>Nov 29 13:21:35 TEST kernel: [19347390.704884] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20450930/power-338)
indeed , the parse_syslog
as well as the syslog input function (seems to follow same logic) should be aware of more specifier , and especially not creating empty or dynamic structure objects.
Afaik, booth RFC5424 and RFC3164 have additional specifier that shall be used @
.
Alternatively a configuration option would be handy to specify a type/rfc manually ?
We also have the same problem with parse_syslog. In addition to Linux kernel messages already described, here are two example logs from Proxmox VE:
<30>Aug 21 11:16:28 pve01 corosync[1548]: [KNET ] rx: host: 2 link: 1 is up
and
<29>Aug 25 17:28:16 pve02 corosync[1539]: [TOTEM ] Retransmit List: dc4d07f
where parse_syslog() will create the empty fields KNET: {} and TOTEM:{}, respectively.
Fixing it should be pretty easy? at least when just considering RFC3164/RFC5424, since only RFC 5424 has structured data and you can detect if this standard is used by the VERSION field: <PRI>1 <TIMESTAMP>
means the log is RFC 5424 and <PRI><TIMESTAMP>
means it is RFC 3164.
SD-IDs doesn't need an @ sign for IANA-registered SD-IDs, for example timeQuality.
A note for the community
Problem
It appears that the parse_syslog uses
[
and]
as key, which insufficient, it should additionally respect@
and maybe another additional magic, or at least switchable. Else regular syslog messages containing[
and]
characters can get misinterpreted as RFC5424 syslog , which creates a lot of dynamic empty fields.Per RFC5424 https://datatracker.ietf.org/doc/html/rfc5424#section-6.3.2 please use SD-ID for best effort detection.
Example:
In this example the syslog part
[19347390.704884]
is not a valid RFC5424 identifier and must not be interpreted as RFC5424!INPUT:
VRL:
OUTPUT:
Configuration
Version
0.34.0
Debug Output
Example Data
<4>Nov 29 13:21:35 TEST kernel: [19347390.704884] ACPI Exception: AE_AML_BUFFER_LIMIT, Evaluating _PMM (20450930/power-338) ### Additional Context _No response_ ### References _No response_