vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.77k stars 1.57k forks source link

Cleanup `journald` source context fields #4304

Open binarylogic opened 4 years ago

binarylogic commented 4 years ago

The journald source should take more care to identify fields and define a schema.

Example

For example, events coming out of Vector's journald source look like this:

{
  "timestamp": <2019-07-26T20:30:27.000443Z>,
  "message": "reply from 192.168.1.2: offset -0.001791 delay 0.000176, next query 1500s",
  "host": "lorien.example.com",
  "__REALTIME_TIMESTAMP": "1564173027000443",
  "__MONOTONIC_TIMESTAMP": "98694000446",
  "_BOOT_ID": "124c781146e841ae8d9b4590df8b9231",
  "SYSLOG_FACILITY": "3",
  "_UID": "0",
  "_GID": "0",
  "_CAP_EFFECTIVE": "3fffffffff",
  "_MACHINE_ID": "c36e9ea52800a19d214cb71b53263a28",
  "PRIORITY": "6",
  "_TRANSPORT": "stdout",
  "_STREAM_ID": "92c79f4b45c4457490ebdefece29995e",
  "SYSLOG_IDENTIFIER": "ntpd",
  "_PID": "2156",
  "_COMM": "ntpd",
  "_EXE": "/usr/sbin/ntpd",
  "_CMDLINE": "ntpd: [priv]",
  "_SYSTEMD_CGROUP": "/system.slice/ntpd.service",
  "_SYSTEMD_UNIT": "ntpd.service",
  "_SYSTEMD_SLICE": "system.slice",
  "_SYSTEMD_INVOCATION_ID": "496ad5cd046d48e29f37f559a6d176f8"
}

Instead of blindly passing through raw fields, we should only map commonly used fields as documented by Journald.

jszwedko commented 2 years ago

Ref: https://github.com/vectordotdev/vector/issues/12187