Open raghu999 opened 4 years ago
@raghu999 great question! Vector's schema assumptions are currently very simple. Common fields names can be controlled via the global log_schema
options. Outside of that, your best bet is to use the rename_fields
transform to match that schema for your data.
But I really like the idea of Vector defining a more explicit schema around all fields. Specifically, the fields added in transforms like ec2_metadata
and geoip
. All of that should be customizable in a global sense.
Our current pipeline also tries to comply to ECS before writing data to elasticsearch.
Considering the following log message, our pipeline looks like this:
2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component || My log message
A first regex_parser
stage will extract individual parts (raw) from the log message. After parsing, the LogEvent
will look like this:
Field | Value |
---|---|
log_timestamp | 2020-13-10T10:01:23Z |
log_thread_id | 12345 |
log_level | INFO |
log_logger | My.Namespace.Component |
log_message | My log message |
We then use a combination of rename_fields and lua transforms (to parse the thread id and timestamp) to rename the fields according to ECS.
Our final LogEvent
will look like this
Field | Value | |
---|---|---|
@timestamp | 2020-13-10T10:01:23Z | |
process.thread.id | 12345 | |
log.level | INFO | |
log.logger | My.Namespace.Component | |
message | My log message | |
host.name | node01 | |
log.original | 2020-13-10T10:01:23Z - 12345 - INFO - My.Namespace.Component | My log message |
Hope that helps
Thanks, @oktal, that's helpful. We are actively outlining first-class support for schemas like ECS. We hope to get the initial versions out this quarter (#3910). It'll likely start with more control over field mapping at the source and sink level and then progress into formal support for the schemas.
https://github.com/ypid/event-processing-framework (modular config for Vector) has extensive support for ECS. Especially things like syslog should have good coverage.
Hi Vector team, general question how can we add Elastic Common Schema for vector data before writing to elasticsearch.