vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17k stars 1.47k forks source link

New `syslog` sink #6863

Open binarylogic opened 3 years ago

binarylogic commented 3 years ago

As per https://github.com/timberio/vector/discussions/6862, we should have a syslog sink that makes it easy to send Syslog formatted logs. This should wrap the socket sink to support the various socket protocols.

prognant commented 3 years ago

Will probably have to support both old (rfc3164) and new (rfc5424).

wfender commented 2 years ago

Hello, Do you have any news about this feature ? It would be a very great and useful feature expected by many people.

jszwedko commented 2 years ago

Hi @wfender . We are working on the groundwork for this in https://github.com/vectordotdev/vector/issues/8617, but don't have a definite ETA yet.

wfender commented 2 years ago

Thank you for this feedback, I look forward to this feature to start experiment Vector in my company. ;)

syedriko commented 1 year ago

Just leaving a comment here that I started working on a new syslog sink, with https://github.com/vectordotdev/vector/pull/7106 as inspiration.

jszwedko commented 1 year ago

Just leaving a comment here that I started working on a new syslog sink, with #7106 as inspiration.

Great! Just a note that we would like to approach this sink a bit differently than #7106 did due to the enhanced codec support that was added to sinks after that PR was opened. Essentially, we'd like to create a syslog codec that can be plugged in with a socket sink (see https://github.com/vectordotdev/vector/tree/master/lib/codecs/src/encoding/format). That will let the syslog sink just be a wrapper around the socket sink with a specific codec configured.

Let me know if that makes sense. Happy to help guide a PR here!

syedriko commented 1 year ago

Just leaving a comment here that I started working on a new syslog sink, with #7106 as inspiration.

Great! Just a note that we would like to approach this sink a bit differently than #7106 did due to the enhanced codec support that was added to sinks after that PR was opened. Essentially, we'd like to create a syslog codec that can be plugged in with a socket sink (see https://github.com/vectordotdev/vector/tree/master/lib/codecs/src/encoding/format). That will let the syslog sink just be a wrapper around the socket sink with a specific codec configured.

Let me know if that makes sense. Happy to help guide a PR here!

Got it, thanks, @jszwedko. Will post a WIP PR as soon as.

sraka1 commented 1 year ago

Is this being worked on?

syedriko commented 1 year ago

Is this being worked on?

Yes, it is. Slowly but surely ;)

haoyu-sgnl commented 1 year ago

I would like to hear more about this, need to figure out how to use vector for syslog sink, any additional info would be helpful!!

git001 commented 1 year ago

Who is working on this issue, which I can join?

amarnathpv commented 1 year ago

Any updates to where we're with this, please? To support syslog as sink either the code for Syslog via Socket or Syslog as sink itself?

jszwedko commented 1 year ago

Any updates to where we're with this, please? To support syslog as sink either the code for Syslog via Socket or Syslog as sink itself?

No updates yet on our end but we agree this would be a good idea if anyone is motivated to contribute! I still think the best approach would be to implement as a "codec" in Vector. The GELF codec could be a reasonable example to base it on.

syedriko commented 1 year ago

I have a syslog codec implementation, but it is specialized for OpenShift's needs. Let me extract it into a PR against the mainline, it might serve as a starting point for what we want to build.

amarnathpv commented 1 year ago

@syedriko - any luck with the syslog codec?

syedriko commented 1 year ago

@syedriko - any luck with the syslog codec?

@amarnathpv Here you go: https://github.com/vectordotdev/vector/pull/17668

amarnathpv commented 1 year ago

Many thanks @syedriko. That would greatly help!

How can I get this version for testing?

syedriko commented 1 year ago

Many thanks @syedriko. That would greatly help!

How can I get this version for testing?

You can build vector from my repo https://github.com/syedriko/vector.git, off the branch syedriko-syslog-codec. For a start, you can bounce this config

[sources.log_generator]
type = "demo_logs"
format = "json"

[sinks.syslog]
inputs = ["log_generator"]
type = "socket"
address = "0.0.0.0:1514"
mode = "udp"

[sinks.syslog.encoding]
codec = "syslog"
rfc = "rfc5424"
facility = "user"
severity = "debug"
app_name = "myapp"
proc_id = "myproc"
msg_id = "mymsg"

against rsyslogd:

5423.927818395:imudp.c        : imudp.c: recv(4,418),acl:1,msg:<15>1 2023-06-13T15:43:43.927-04:00 - myapp myproc
 mymsg - {"message":"{\"host\":\"251.239.70.77\",\"user-identifier\":\"shaneIxD\",\"datetime\":\"13/Jun/2023:15:43:43
\",\"method\":\"HEAD\",\"request\":\"/wp-admin\",\"protocol\":\"HTTP/1.1\",\"status\":\"302\",\"bytes\":6050,\"referer
\":\"https://names.us/controller/setup\"}","service":"vector","source_type":"demo_logs","timestamp":"2023-06-13T19:43:43.926564381Z"}

Since syslog is a codec just like the other codecs, you can run your socket sink on top of UDP, TCP and TLS.

Syslog message fields can be hardcoded in the sink configuration as above, but they can also be populated from the log event fields. Here's an example:

[sources.log_generator]
type = "demo_logs"
format = "json

[transforms.parse_demo_logs]
inputs = ["log_generator"]
type = "remap"
source = '''
. = parse_json!(string!(.message))
'''

[sinks.foobar]
inputs = ["parse_demo_logs"]
type = "socket"
address = "0.0.0.0:1514"
mode = "udp"

[sinks.foobar.encoding]
codec = "syslog"
rfc = "rfc5424"
facility = "user"
severity = "debug"
app_name = "myapp"
proc_id = "myproc"
msg_id = "$$.message.host"

, which produces

1343.221646458:imudp.c        : imudp.c: recv(4,300),acl:1,msg:<15>1 2023-06-13T20:09:03.221-04:00 - myapp myproc 137.212.170.113 - {"bytes":9699,"datetime":"13/Jun
/2023:20:09:03","host":"137.212.170.113","method":"PATCH","protocol":"HTTP/2.0","referer":"https://for.us/controller
/setup","request":"/controller/setup","status":"301","user-identifier":"ahmadajmi"}

The "$$.message.host" syntax is there for legacy reasons. "host" is the name of the log event field to populate the RFC 5424 msg_id field with.

amarnathpv commented 1 year ago

Awesome, thanks Sergej @sydreko

pznamensky commented 9 months ago

hey @syedriko, since https://github.com/vectordotdev/vector/pull/17668 seems to be working fine, how do you feel about converting it from draft in to PR?

syedriko commented 9 months ago

@pznamensky I'm glad to hear ^^^, it's working pretty well in OpenShift, too. I'll get to it, too many balls to juggle.

gaby commented 8 months ago

@syedriko @jszwedko Any update on this? I was going to use Vector to emit syslog from multiple sources and ran into a road-block because of the lack of a Syslog sink.

Basically I sent syslof from multiple sources into vector and parse these as JSON, but I also need to forward the original Syslog to other follow-on processes.

jszwedko commented 8 months ago

@syedriko @jszwedko Any update on this? I was going to use Vector to emit syslog from multiple sources and ran into a road-block because of the lack of a Syslog sink.

Basically I sent syslof from multiple sources into vector and parse these as JSON, but I also need to forward the original Syslog to other follow-on processes.

Nothing yet unfortunately. If anyone wants to take a stab at contributing this, I think the path would look like:

gaby commented 8 months ago

@syedriko @jszwedko Any update on this? I was going to use Vector to emit syslog from multiple sources and ran into a road-block because of the lack of a Syslog sink. Basically I sent syslof from multiple sources into vector and parse these as JSON, but I also need to forward the original Syslog to other follow-on processes.

Nothing yet unfortunately. If anyone wants to take a stab at contributing this, I think the path would look like:

  • Create a syslog encoder
  • Create a syslog sink that is just the socket sink with the syslog encoder hardcoded

What about #17668 ? It seems to cover those.

jszwedko commented 8 months ago

@syedriko @jszwedko Any update on this? I was going to use Vector to emit syslog from multiple sources and ran into a road-block because of the lack of a Syslog sink. Basically I sent syslof from multiple sources into vector and parse these as JSON, but I also need to forward the original Syslog to other follow-on processes.

Nothing yet unfortunately. If anyone wants to take a stab at contributing this, I think the path would look like:

  • Create a syslog encoder
  • Create a syslog sink that is just the socket sink with the syslog encoder hardcoded

What about #17668 ? It seems to cover those.

That looks like a good start! The original author seems to have fallen off unfortunately, but it should serve as a good base for a new PR.

syedriko commented 7 months ago

@syedriko @jszwedko Any update on this? I was going to use Vector to emit syslog from multiple sources and ran into a road-block because of the lack of a Syslog sink. Basically I sent syslof from multiple sources into vector and parse these as JSON, but I also need to forward the original Syslog to other follow-on processes.

Nothing yet unfortunately. If anyone wants to take a stab at contributing this, I think the path would look like:

  • Create a syslog encoder
  • Create a syslog sink that is just the socket sink with the syslog encoder hardcoded

What about #17668 ? It seems to cover those.

That looks like a good start! The original author seems to have fallen off unfortunately, but it should serve as a good base for a new PR.

@jszwedko I haven't gone anywhere, just been swamped. It would be great to get https://github.com/vectordotdev/vector/pull/17668 reviewed so I can start addressing the feedback.

gaby commented 7 months ago

@syedriko Awesome to hear, can you update you branch and fix merge conflicts. Also mark it as ready, right now it shows as "Draft"

syedriko commented 7 months ago

@syedriko Awesome to hear, can you update you branch and fix merge conflicts. Also mark it as ready, right now it shows as "Draft"

Done

gaby commented 7 months ago

@syedriko Awesome to hear, can you update you branch and fix merge conflicts. Also mark it as ready, right now it shows as "Draft"

Done

Thank you!

@jszwedko This should be step forward for getting thus feature :-)

jszwedko commented 7 months ago

Thanks @syedriko ! We'll take a look and leave a review.

suslikas commented 7 months ago

Awesome functionality, because a lot of tools accept syslog only... BTW, pretty nice practice - support same inputs and sinks. One of benefit of vector - be like wrapper, manipulate with data on-the-fly...

fpytloun commented 3 months ago

Also interested in this. Can this be ressurected to have syslog sink functionality?

gaby commented 3 months ago

@fpytloun This is probably going nowhere, same as https://github.com/vectordotdev/vector/pull/18076

@jszwedko Were you able to take a look?

pznamensky commented 3 months ago

@polarathene is working on it: https://github.com/vectordotdev/vector/pull/17668#issuecomment-1876073744

polarathene commented 3 months ago

Hey @gaby I will have time this weekend to wrap up the refactor PR and publish it as WIP.

I'd like to refactor it some more and add some tests before bothering the team to review it, but it's in fairly good shape if you'd like to help test it out when published to ensure it's headed in the right direction :)

I'll send a link here once it's up 👍


EDIT: Sorry for the delay, life and burn-out happened 😬

pznamensky commented 2 months ago

As a workaround, you can try assembling syslog messages yourselves. Just an example of how we ended up with:

...
  transforms:
    ...
    syslog_logs:
      type: remap
      inputs: ["some_logs"]
      # let's make RFC 5424 compatible messages for rsyslog
      # read more about the format:
      # https://blog.datalust.co/seq-input-syslog/#rfc5424
      source: |-
        ., err = "<86>1 " + to_string(.timestamp) + " eks " + .kubernetes.container_name + " 0 - - " + decode_base16!("EFBBBF") + .message
        if err != null {
          log(err, level: "error")
        }
...
  sinks:
    rsyslog_general:
      type: "socket"
      inputs: ["syslog_logs"]
      address: "some_ip:12345"
      mode: "tcp"
      encoding:
        codec: "text"
      framing:
        method: "newline_delimited"

To receive messages, we use imptcp module in rsyslog.

gaby commented 2 months ago

@pznamensky Thanks for sharing!

fpytloun commented 2 months ago

@pznamensky thank you! I enhanced that VRL little bit:

      pri = 1 * 8 + to_syslog_severity(.severity) ?? 6

      ., err = join([
        "<" + to_string(pri) + ">" + "1",     # <pri>version
        to_string!(.@timestamp),
        to_string!(.kubernetes.pod_name || .hostname || "${VECTOR_SELF_NODE_NAME}"),
        to_string!(.app || .kubernetes.labels.app || "-"),
        "-",                                  # procid
        to_string!(.messageid || "-"),        # msgid
        "-",                                  # structured-data
        decode_base16!("EFBBBF") + to_string!(.message || encode_json(.))   # msg
      ], separator: " ")

      if err != null {
        log("Unable to construct syslog message for event:" + err + ". Dropping invalid event: " + encode_json(.), level: "error", rate_limit_secs: 10)
      }

And here is unit test (I fell in love with Vector's unit tests 🙂):

[[tests]]
  name = "remap_syslog"

  [[tests.inputs]]
    insert_at = "remap_syslog"
    type = "vrl"
    source = '''
    . = {
      "kubernetes": {
        "labels": {
          "app": "myapp"
        },
        "pod_name": "mypod",
        "container_name": "mycontainer"
      },
      "@timestamp": "2024-04-25T18:18:25.654Z",
      "severity": "error",
      "message": "Dummy message"
    }
    '''

  [[tests.outputs]]
    extract_from = "remap_syslog"

    [[tests.outputs.conditions]]
      type = "vrl"
      source = '''
        parsed, err = parse_syslog(.message)
        assert_eq!(err, null, err)

        log("Parsed syslog message: " + encode_json(parsed), level: "warn")

        assert_eq!(parsed.appname, "myapp", "appname field not set from kubernetes labels")
        assert_eq!(parsed.facility, "user", "facility not set properly")
        assert_eq!(parsed.hostname, "mypod", "hostname not set")
        assert_eq!(parsed.severity, "err", "severity value is not set")
        assert_eq!(parsed.message, decode_base16!("EFBBBF") + "Dummy message", "message value not correct")
      '''

I think parse_syslog should strip BOM prefix when parsing message, this seems to be a bug in this function 🤔

gaby commented 3 weeks ago

I wonder if this is something VRL/Vector could add a function for?

jszwedko commented 3 weeks ago

👍 I think it would make sense to have a VRL function too.