vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
138 stars 70 forks source link

VRL: syslog priority parsing #131

Open rwaweber opened 3 years ago

rwaweber commented 3 years ago

Current Vector Version

0.12.1

Use-cases

"The Priority value is calculated by first multiplying the Facility number by 8 and then adding the numerical value of the Severity." ~ cite from RFC5424

e.g.:

Priority = Facility * 8 + Severity 

Facility = Priority % 8
Severity = Priority // 8

Sometimes, when dealing with non-rfc-compliant syslog variations, only parts of the RFC are followed. In those cases, its not uncommon for the syslog priority to be one of the only consistent distinguishing features of somelogs.

Being able to extract both the facility and severity code from those messages would be pretty handy, and from a cursory overview, looks like it would fit in pretty nicely as a VRL function.

Attempted Solutions

None as of yet.

Proposal

Add to_syslog_facility_from_priority and to_syslog_severity_from_priority functions to VRL for making use of Syslog Priority fields.

Or just one function with three return values(Facility, Severity, Error), what do you all think?

One more question, I noticed that we use the syslog_loose create which has a decompose_pri function that pretty much takes care of what I'm talking about. Would you all have a preference for using that over just implementing it in this plugin? I don't imagine it'd change how much code ends up there in either case, but I figured it's probably worth asking.

References

Related in spirit: https://github.com/timberio/vector/issues/5769

JeanMertz commented 3 years ago

Hey @rwaweber, would you mind giving an example of input and output you'd expect?

I'm curious what more we need in addition to the existing to_syslog_severity, to_syslog_facility, and to_syslog_level.

hb9hnt commented 1 year ago

We have a lot of log sources where the syslog soruce failes because the log is formatted in a not very RFC compliant way - mainly they just send a conform syslog priority and then append whatever they like.

We worked around not having the mentioned function by using the following rather ugly (EDIT: now less ugly, see bottom) extraction of the facility and severity code and name in vrl:

# Extrac priority
. = . | parse_regex!(.message, r'^<(?P<syslog_pri>\d+)>.*')
.log.syslog.priority = parse_int!(del(.syslog_pri))

# Extract severity
.log.syslog.severity.code = mod(.log.syslog.priority, 8)
.log.syslog.severity.name = to_syslog_level!(.log.syslog.severity.code) 

# Extract facility
.log.syslog.facility.code = to_int(floor(.log.syslog.priority / 8))
.log.syslog.facility.name = to_syslog_facility!(.log.syslog.facility.code)

This results in the following Elastic Common Schema JSON:

"log": {
  "syslog": {
    "priority": 22,
    "facility": {
      "code": 2,
      "name": "mail"
    },
    "severity": {
      "code": 6,
      "name": "info"
    }
  }
}

Of course the structure can be whatever but it would be nice to have a way to extract the numeric value of the priority, facility and severity as well as the respective string representation.

IMHO one downside of using the parse_syslog function or the syslog source is that we loose the numeric values.

EDIT: I just discovered the to_syslog_facility and to_syslog_level functions. Now I'm not convinced anymore that we even need this functionality since with these functions and a little arithmetic we can achieve the same effect.