vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.57k stars 1.54k forks source link

ECS log schema support #2423

Open raghu999 opened 4 years ago

raghu999 commented 4 years ago

Hi Vector team, general question how can we add Elastic Common Schema for vector data before writing to elasticsearch.

binarylogic commented 4 years ago

@raghu999 great question! Vector's schema assumptions are currently very simple. Common fields names can be controlled via the global log_schema options. Outside of that, your best bet is to use the rename_fields transform to match that schema for your data.

But I really like the idea of Vector defining a more explicit schema around all fields. Specifically, the fields added in transforms like ec2_metadata and geoip. All of that should be customizable in a global sense.

oktal commented 3 years ago

Our current pipeline also tries to comply to ECS before writing data to elasticsearch.

Considering the following log message, our pipeline looks like this:

2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component || My log message

A first regex_parser stage will extract individual parts (raw) from the log message. After parsing, the LogEvent will look like this:

Field Value
log_timestamp 2020-13-10T10:01:23Z
log_thread_id 12345
log_level INFO
log_logger My.Namespace.Component
log_message My log message

We then use a combination of rename_fields and lua transforms (to parse the thread id and timestamp) to rename the fields according to ECS.

Our final LogEvent will look like this

Field Value
@timestamp 2020-13-10T10:01:23Z
process.thread.id 12345
log.level INFO
log.logger My.Namespace.Component
message My log message
host.name node01
log.original 2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component My log message

Hope that helps

binarylogic commented 3 years ago

Thanks, @oktal, that's helpful. We are actively outlining first-class support for schemas like ECS. We hope to get the initial versions out this quarter (#3910). It'll likely start with more control over field mapping at the source and sink level and then progress into formal support for the schemas.

ypid commented 1 year ago

https://github.com/ypid/event-processing-framework (modular config for Vector) has extensive support for ECS. Especially things like syslog should have good coverage.