vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.83k stars 1.58k forks source link

GELF codec decoder should default the "host" field to client IP address #13323

Open neuronull opened 2 years ago

neuronull commented 2 years ago

A note for the community

Problem

As part of #4868 , PR #13288 submitted GELF decoding support.

Currently the approach taken for "host" field is that if "host" is missing from the incoming input, vector omits the "host" from the created event.

The behavior of the graylog node however, is to take the client IP address as the "host" if it is not specified in the input.

Opening this issue based on the discussion here: https://github.com/vectordotdev/vector/pull/13288#discussion_r906022686

, since it is not going to be straightforward to get the IP address in all source scenarios.

Configuration

No response

Version

v0.23.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

mtrin commented 2 years ago

I've been testing it with GELF with kubernetes_logs Had different sorts of "issues" (most probably due to my own learning curve). But this is how I made it work in the end:

transforms:
      kubernetes_host_transform:
        type: remap
        drop_on_error: true
        inputs:
          - kubernetes_logs
        source: |-
          .host = get_hostname!()
          .container_image = string!(.kubernetes.container_image)
          del(.kubernetes)
          message = string!(.message)
          del(.timestamp)
          source_type = string!(.source_type)
          file = string!(.file)
          del(.timestamp_end)
          del(.stream)

Question is, since unlike the other codecs the GELF one is strict i.e. needs the host, what is the best approach to ingest kubernetes_logs? Seems like the only way to use it is by remapping all the fields and convert to string?

neuronull commented 2 years ago

Question is, since unlike the other codecs the GELF one is strict i.e. needs the host, what is the best approach to ingest kubernetes_logs? Seems like the only way to use it is by remapping all the fields and convert to string?

Indeed, the approach to take for input that is not already matching the GELF standard, is to use transforms.