vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.79k stars 1.57k forks source link

Add CEF codec #17332

Open nabokihms opened 1 year ago

nabokihms commented 1 year ago

A note for the community

Use Cases

SIEM systems are everywhere. It is required to use CEF format to send events to many of them. Vector can efficiently collect data, and we use it as a central events processor, but it is not possible to send events to old-fashioned SIEM systems.

Attempted Solutions

VRL is an option but is not convenient enough. Users need to understand how to compose the right string.

Proposal

Add a new codec to encode messages into CEF format.

encoding:
  codec: cef
  cef:
    header: Vector # Optional, Vector by default
    version: v0.29.0 # Optional, Vector version by default
    name: <vrl to get event name> # Can be constant
    description: <vrl to get event description> # Can be constant
    severity: <vrl to get event severity> # Can be constant
    field: [<array of fields to decode like for CSV codec>] # Optional

References

No response

Version

v0.29

spencergilbert commented 1 year ago

I don't know if we could imbed VRL in the encoder like that today, but it would be interesting to get that working.

nabokihms commented 1 year ago

Using VRL (if it is a problem) is not necessary. Instead, the feature can be implemented another way.

More optional fields:

encoding:
  codec: cef
  cef:
    header: Vector # Optional, Vector by default
    version: v0.29.0 # Optional, Vector version by default
    name: <key to get event name> # Optional, cef.name by default
    description: <key to get event description> # Optional, cef.description by default
    severity: <key to get event severity> # Optional, cef.severity by default
    field: [<array of fields to decode like for CSV codec>] # Optional

Example:

{
  "message": "{\"@timestamp\": \"1683583245\",\"id\": \"ConsumerFetcherManager-1382721708341\",\"level\": \"info\",\"module\": \"kafka.consumer.ConsumerFetcherManager\",\"msg\": \"Stopping all fetchers\"}"
}
transforms:
  cef_fields:
    type: remap
    source: |
        . = parse_json!(.message)
        .timestamp = ."@timestamp"

        ."cef.name" = "Security event"

        ."cef.description" = .msg

        ."cef.severity" = "1"
        if .level == "warning" {
            ."cef.severity" = "4"
        }
        if .level == "error" {
            ."cef.severity" = "9"
        }
sinks:
  siem:
    type: socket
    inputs: ["cef_fields"]
    encoding:
       codec: "cef"
       cef:
         fields:
         - id
         - module
         - timestamp

That will produce this line:

CEF:0|Vectordotdev|Vector|v0.29.0|Security Event|Stopping all fetchers|1|id=ConsumerFetcherManager-1382721708341 module=kafka.consumer.ConsumerFetcherManager timestamp=1683583245
nabokihms commented 1 year ago

My question is whether there is a chance for such a feature to be accepted or not. Do you know if it requires an RFC?

spencergilbert commented 1 year ago

My question is whether there is a chance for such a feature to be accepted or not. Do you know if it requires an RFC?

I know we've discussed wanting to enable the use of VRL in more places, such as sink config/encoding config - but I'm not sure where we are on that today. I suspect we'd want an RFC for how that would work, which could be separate from an initial implementation of this codec.

@jszwedko what do you think, do you know if we'd still like to add VRL support to components outside of remap? I know recently timestamp formatting has come up where encoding.timestamp_format is more limited than what's possible in VRL - and using a VRL function there would be handy.

jszwedko commented 1 year ago

We discussed this today and think CEF is a reasonable addition to Vector's codec system alongside GELF, syslog, CSV, JSON, etc.. We did discuss that it would be nice to have a general pattern for sharing code between VRL and codecs since we have a few encoders/decoders supported in both places such as Syslog.

A vrl codec would also be interesting (that is being tracked by https://github.com/vectordotdev/vector/issues/13634 and would probably require an RFC).

nabokihms commented 1 year ago

@jszwedko @spencergilbert what are my next steps to move this forward? Do I need to start with an RFC or dive straight into implementation?

spencergilbert commented 1 year ago

@nabokihms if you wanted to implement the VRL based codec we'd want an RFC - if you were just implementing a CEF codec in the same style as the GELF/Syslog/etc going straight into the code is fine.

zamazan4ik commented 1 year ago

if you were just implementing a CEF codec in the same style as the GELF/Syslog/etc going straight into the code is fine.

@nabokihms I suggest you go with a "just CEF codec" way for this issue. It would be much easier to implement and still would be useful.