Closed gschier closed 2 years ago
I am a bit surprised that this test passes if this is the case.
The test is fine, I think this is a documentation issue. Probably outdated documentation. In my opinion, it is good/desirable that the syslog structured data fields are properly namespaced in the resulting object. We use this feature as well and is very convenient, otherwise SDs with clashing properties would be all over the place.
I am a bit surprised that this test passes if this is the case.
I think that test is showing the behavior described by this issue, that the fields are namespaced under the "name" of the structured data section. I agree with @hhromic that the namespace is desirable to avoid conflicts. I think we should just update the docs.
I went to refresh my memory about this subject in our deployed pipeline and we are using the parse_syslog()
VRL in a remap
, not the syslog
source. Apologies! In our setup with VRL, Vector is not parsing sub-objects from the syslog SD fields, but indeed as simple root-level fields with namespaces from the SD. For example:
parsed = parse_syslog!(s'<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo="bar"] test message')
# { "appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d", "exampleSDID@32473.foo": "bar", "facility": "kern", "hostname": "Gregorys-MacBook-Pro.local", "message": "test message", "procid": 79978, "severity": "alert", "timestamp": t'2022-04-25T23:21:45.715740Z', "version": 1 }
The parsed SD field is {"exampleSDID@32473.foo": "bar"}
which is NOT a sub-object, just a plain field with string-type key exampleSDID@32473.foo
and value bar
.
$ parsed.exampleSDID@32473.foo
null
$ parsed."exampleSDID@32473.foo"
"bar"
BUT! The syslog
source indeed is parsing the SD fields as sub-objects, so there is an inconsistency there:
{"appname":"2d4d9490-794a-4e60-814c-5597bd5b7b7d","exampleSDID@32473":{"foo":"bar"},"facility":"kern","host":"Gregorys-MacBook-Pro.local","hostname":"Gregorys-MacBook-Pro.local","message":"test message","procid":79978,"severity":"alert","source_ip":"127.0.0.1","source_type":"syslog","timestamp":"2022-04-25T23:21:45.715740Z","version":1}
Looks like the documentation is indeed aligned with the parse_syslog()
VRL function behaviour but the syslog
source is expanding the .
-separated namespaces in the keys into sub-objects?
Yes, can confirm that the syslog
source is "unnesting" keys with periods in them.
I just sent this packet with foo.baz
as the attribute of the exampleSDID@32473
SD:
<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message
And got this from the syslog
source:
{
"appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d",
"exampleSDID@32473": {
"foo": {
"baz": "bar"
}
},
"facility": "kern",
"host": "Gregorys-MacBook-Pro.local",
"hostname": "Gregorys-MacBook-Pro.local",
"message": "test message",
"procid": 79978,
"severity": "alert",
"source_ip": "127.0.0.1",
"source_type": "syslog",
"timestamp": "2022-04-25T23:21:45.715740Z",
"version": 1
}
Definitively not a good behaviour :)
The parse_syslog()
VRL function does not exhibit this behaviour:
parse_syslog!(s'<1>1 2022-04-25T23:21:45.715740Z Gregorys-MacBook-Pro.local 2d4d9490-794a-4e60-814c-5597bd5b7b7d 79978 - [exampleSDID@32473 foo.baz="bar"] test message')
# { "appname": "2d4d9490-794a-4e60-814c-5597bd5b7b7d", "exampleSDID@32473.foo.baz": "bar", "facility": "kern", "hostname": "Gregorys-MacBook-Pro.local", "message": "test message", "procid": 79978, "severity": "alert", "timestamp": t'2022-04-25T23:21:45.715740Z', "version": 1 }
And yes, now I'm also surprised like @StephenWakely that the referenced test is passing :) Maybe there is some automatic unnesting going on?
Aha, interesting, thanks for that investigation @hhromic . In my opinion, parse_syslog
should match the syslog
source behavior and actually nest the fields (granted that this would be a breaking change).
The test @StephenWakely is referencing is in the syslog
source. The .insert()
calls seen there will interpret the .
s as creating nested objects (the function takes a "path").
Opened https://github.com/vectordotdev/vector/issues/12431 to address the mismatch.
In my opinion,
parse_syslog
should match thesyslog
source behavior and actually nest the fields (granted that this would be a breaking change).
If aligning the behaviour is the goal, it will be a breaking change one way or another :( Regarding which approach is desirable, is a good question. Perhaps "nested" is indeed more convenient/powerful, especially when iteration support lands and these keys can be easily iterated over/manipulated for.. reasons!
In the worst case, in VRL you can always obtain the flattened version with flatten()
from the nested object (if needed).
A note for the community
No response
Problem
The structured data in the Syslog source does not end up as root properties but is instead included in a sub-object keyed on the Syslog SD-ID. The docs clearly state that structure data properties will appear on the root object in the output section:
As well as in the Example:
Configuration
Version
vector 0.17.3 (x86_64-apple-darwin d72c6e7 2021-10-21)
Debug Output
No response
Example Data
Syslog Message:
Resulting event:
Additional Context
No response
References
This other issue also found some discrepancies with the Syslog Example https://github.com/vectordotdev/vector/issues/9281