Open guyboertje opened 6 years ago
I am not sure that is a very common example. In that case I would use dissect to parse out the full kv list and then apply the kv filter to it.
I think a more common use case is when you have a log file with a number of different log line formats in it and you want to try these against a list of dissect patterns in sequence and break when a match is found, similar to how grok works.
I picked some sample PAM logs from https://ossec-docs.readthedocs.io/en/latest/log_samples/auth/pam.html
Jul 7 10:51:24 srbarriga su(pam_unix)[14592]: session opened for user test2 by (uid=10101)
Jul 7 10:53:07 srbarriga su(pam_unix)[14592]: session closed for user test
Jul 7 10:55:56 srbarriga sshd(pam_unix)[16660]: authentication failure; logname= uid=0 euid=0 tty=NODEVssh ruser= rhost=192.168.20.111 user=root
If I wanted to parse these using dissect, they all have slight variations in format. If multiple patterns were allowed and matching would break after first success, something like this could work:
dissect {
break_on_match => true
mapping => {
"message" => [
"%{ts->} %{+ts} %{+ts} %{host} %{command}(pam_unix)[%{pid}]: %{action} %{+action} for user %{user} by (uid=%{uid})%{}",
"%{ts->} %{+ts} %{+ts} %{host} %{command}(pam_unix)[%{pid}]: %{action} %{+action} for user %{user}",
"%{ts->} %{+ts} %{+ts} %{host} %{command}(pam_unix)[%{pid}]: %{action} %{+action}; %{params}"
}
}
Maybe this scenario could be handled by the cascading as well?
I think the best way to implements it as @guyboertje proposed is to add a new sequence
option in the dissect filter that will support multiple definition of dissect/mapping in an array instead of a hash.
This would reflect the behavior of definining multiple dissect plugin in the configuration and will be backward compatible.
dissect {
sequence => [
{
break_on_match => false,
field => "message",
tokenizer => [
"%{ts->} %{+ts} %{+ts} %{host} %{rest}"
},
{
break_on_match => true,
field => "rest",
tokenizer => [
"%{command}(pam_unix)[%{pid}]: %{action} %{+action} for user %{user} by (uid=%{uid})%{}",
"%{command}(pam_unix)[%{pid}]: %{action} %{+action} for user %{user}",
"%{command}(pam_unix)[%{pid}]: %{action} %{+action}; %{params}"
}]
}
As discussed with @ph, this is a variation on using sequence
but is less cryptic.
It also adds target
and clarifies the different between breaking out of the patterns
(tokeniser
) vs breaking out of the sequence
; plus tags
to trace which patterns in the sequence matched.
dissect {
# Jul 7 10:52:14 srbarriga sshd(pam_unix)[17365]: session opened for user test by (uid=508)
# Nov 17 21:41:22 localhost su[8060]: (pam_unix) session opened for user root by (uid=0)
# Nov 11 22:46:29 localhost vsftpd: pam_unix(vsftpd:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=1.2.3.4
target => "captured_fields"
sequence => [
{
source => "message"
target => "inner_fields"
patterns => [
# always breaks on match of a pattern, but continues with sequence unless stopped
{
pattern => "%{ts->} %{+ts} %{+ts} %{host} %{message}"
tags => ["pam_format_common"]
stop_sequence_on_match => false # default
}
]
},
{
source => "[inner_fields][message]"
patterns => [
{
pattern => "%{command}(pam_unix)[%{pid}]: %{rest}"
tags => ["pam_format_1"]
stop_sequence_on_match => true
},
{
pattern => "%{command}[%{pid}]: (pam_unix) %{message}"
tags => ["pam_format_2"]
stop_sequence_on_match => false
},
{
pattern => "%{command}: pam_unix(%{process_name}): %{message}"
tags => ["pam_format_3"]
stop_sequence_on_match => false
}
]
},
{
source => "[captured_fields][message]"
patterns => [
{
pattern => "%{action} %{+action} for user %{user} by (uid=%{uid})%{}"
tags => ["pam_format_for_user"]
stop_sequence_on_match => false
},
{
pattern => "%{action}; %{kv_params}"
tags => ["pam_format_kv"]
stop_sequence_on_match => false
}
]
}
]
}
@ph and I decided that we should explicitly support the idea that a pattern is "anchored" to the start of the field value.
A pattern of:
"---BEGIN---%{field1} %{field2}"
should match a value string of:
"---BEGIN---foo bar"
and should NOT match a value string of:
"some preamble ---BEGIN---foo bar"
.
A leading skip
field should be used if there is any chance that a value string can have some unknown content before the known ---BEGIN---
delimiter.
"%{}---BEGIN---%{field1} %{field2}"
Hi,
Are there any updates for this enhancement? I am planning to implement dissect for an upcoming project.
Thanks
Could we get an update on this feature request?
I will raise visibility and try to get an update for you.
No updates on the issue in terms of having an actual implementation - isn't supported in latest dissect plugin. At this point, we're happy to review PRs if anyone has a take on the feature.
There are a few drivers for this.
People are familiar with this from Grok. Beats and Ingest Node would like support Dissect style de-structuring. Grok classifier in ML would like to support it. It would simply some configs, see this for more info:
To (suggestion):