vectordotdev / vrl

Vector Remap Language
Mozilla Public License 2.0
135 stars 68 forks source link

parse_syslog follows RFC 5425 even with empty structured data, lacking key-value pairs. #1043

Open itkovian opened 2 months ago

itkovian commented 2 months ago

We're getting a bit of junk when processing syslog lines from several applications that implement the old syslog format (RFC 3164) but are actually putting something in [] without key-value pairs.

I see in the tests that

        empty_sd_element {
            args: func_args![value: r#"<13>1 2019-02-13T19:48:34+00:00 74794bfb6795 root 8449 - [empty] qwerty"#],
            want: Ok(btreemap!{
                "message" => "qwerty",
                "appname" => "root",
                "facility" => "user",
                "hostname" => "74794bfb6795",
                "message" => "qwerty",
                "procid" => 8449,
                "severity" => "notice",
                "timestamp" => chrono::Utc.ymd(2019, 2, 13).and_hms_milli(19, 48, 34, 0),
                "version" => 1,
                "empty" => btreemap! {},
            }),
            tdef: TypeDef::object(inner_kind()).fallible(),
        }

I am wondering if this is actually the expected behaviour. What would be the rationale behind having empty structured data? Is it not more logical that these kinds of messages actually follow RFC 3164 instead of RFC 5424?

An empty structured message with information that varies wildly (e.g., a job ID (from slurm), some GC time (from Kafka)) means that the data that is shipped off to e.g. Elasticsearch will pollute the fields, as the object key may be interpreted as a fieldname with dynamic mapping turned on.

Is there a way to control this behaviour?

jszwedko commented 2 months ago

Hi @itkovian ! Thanks for opening this. I might be missing something, but it looks like your input is actually RFC5424 and not RFC3164:

<13>1 2019-02-13T19:48:34+00:00 74794bfb6795 root 8449 - [empty] qwerty

You can tell because it has the version specifier (1) after the severity/priority (also the timestamp is RFC5424 format and not RFC3164 format, and it has procid, msgid, etc.). In this case [empty] is interpreted as structured data per RFC5424. If it isn't structured data, and instead part of the msg, there should be an extra - preceding it to indicate that the sd-data is nil.

itkovian commented 2 months ago

Yes, the example comes from the tests in your code :)

The problem I am facing is that several applications are logging lines like

2024-09-19T15:39:45.469435+02:00 node3521 slurmstepd[548422]: [65684352.batch] done with job

And these lack a -.

This gets parsed like so: https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLiB8PSBwYXJzZV9zeXNsb2chKC5tZXNzYWdlKSIsImV2ZW50Ijp7Im1lc3NhZ2UiOiIyMDI0LTA5LTE5VDE1OjM5OjQ1LjQ2OTQzNSswMjowMCBub2RlMzUyMSBzbHVybXN0ZXBkWzU0ODQyMl06IFs2NTY4NDM1Mi5iYXRjaF0gZG9uZSB3aXRoIGpvYlxuIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

{
    "65684352.batch": {},
    "appname": "slurmstepd",
    "hostname": "node3521",
    "message": "done with job",
    "procid": 548422,
    "timestamp": "2024-09-19T13:39:45.469435Z"
}

This is correct according to the RFC 5424 as I understand, but I'd like to be able to force it to RFC 3164 instead of 5424.

StephenWakely commented 2 months ago

Yes, the example comes from the tests in your code :)

The problem I am facing is that several applications are logging lines like

2024-09-19T15:39:45.469435+02:00 node3521 slurmstepd[548422]: [65684352.batch] done with job

And these lack a -.

This gets parsed like so: https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLiB8PSBwYXJzZV9zeXNsb2chKC5tZXNzYWdlKSIsImV2ZW50Ijp7Im1lc3NhZ2UiOiIyMDI0LTA5LTE5VDE1OjM5OjQ1LjQ2OTQzNSswMjowMCBub2RlMzUyMSBzbHVybXN0ZXBkWzU0ODQyMl06IFs2NTY4NDM1Mi5iYXRjaF0gZG9uZSB3aXRoIGpvYlxuIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

{
  "65684352.batch": {},
  "appname": "slurmstepd",
  "hostname": "node3521",
  "message": "done with job",
  "procid": 548422,
  "timestamp": "2024-09-19T13:39:45.469435Z"
}

This is correct according to the RFC 5424 as I understand, but I'd like to be able to force it to RFC 3164 instead of 5424.

Can you post exactly what you would like the resulting message to look like. I think it should be fairly easy to add an extra parameter, but need to make sure it would do what you want.

itkovian commented 2 months ago

Sure. I'd like it to match the RFC 3164, resulting in

{
    "appname": "slurmstepd",
    "hostname": "node3521",
    "message": "[65684352.batch] done with job",
    "procid": 548422,
    "timestamp": "2024-09-19T13:39:45.469435Z"
}

If you can tell me where in the code I should start looking, I can have a try myself.

StephenWakely commented 2 months ago

Cool. The relevant call is here. Currently it passes Variant::Either. You will need to update it so it can also pass Variant::RFC3164 or Variant::RFC5424.

You will need to add an optional parameter here. Make it default to either.

Have a look at the level parameter for the log function to get an idea of how to setup an enum parameter that can only accept a limited set of values.

Any questions give me a shout.

itkovian commented 2 months ago

Hmm, I see that syslog_loose also tries to get some structured data for RFC3164, even though this should not be supported (calling structured_data_optional(false) here)

I opened an issue for this: https://github.com/StephenWakely/syslog-loose/issues/37