vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.46k stars 1.53k forks source link

GELF TCP output not working (delimiter?) #18008

Closed MartinEmrich closed 1 year ago

MartinEmrich commented 1 year ago

A note for the community

Problem

I try to get GELF log output via TCP to Graylog working. Running vector version 0.31.0.

Graylog does not accept/receive the messages, but logs

2023-07-18 06:37:35,634 ERROR: org.graylog2.plugin.inputs.transports.AbstractTcpTransport - Error in Input [GELF TCP/642ae3487835f22df2df4fc5] (channel [id: 0x704607a6, L:/10.40.91.104:12201 ! R:/10.40.131.202:57492]) (cause io.netty.handler.codec.TooLongFrameException: frame length exceeds 2097152: 2106469 - discarded)

I suspect the delimiter (\0-Byte) is not working.

I tried setting the framing.characted_delimited field to 0 as well as "\u0000", no difference.

Configuration

      sources:
        kubernetes_logs:
          type: kubernetes_logs
      transforms:
        gelftransform:
          type: remap
          inputs: [kubernetes_logs]
          source: |
            .host = .kubernetes.pod_name
            .kubernetes = encode_json(.kubernetes)
      sinks:
        gelf:
          type: socket
          inputs: [gelftransform]
          address: "graylog-gelf-tcp.graylog:12201"
          mode: tcp
          encoding: 
            codec: gelf
            except_fields: 
            - timestamp_end
            - file
            framing:
              method: "character_delimited"
              character_delimited:
                delimiter: "\u0000"
          healthcheck: false

Version

vector 0.31.0 (x86_64-unknown-linux-gnu 0f13b22 2023-07-06 13:52:34.591204470)

Additional Context

A possible red flag that I might just be doing it wrong is that I had to massage the events quite a bit to have them accepted by the sink GELF encoding using a transform step: Encode the kubernetes Object to JSON string, set a (for gelf mandatory) host field, drop the not accepted timestamp_end field etc.

The "Null Frame Delimiter" option on the Graylog Input is enabled (currently working fine with fluentd; Logs might contain newlines)

References

4868 looks like this should be possible now.

MartinEmrich commented 1 year ago

I just tried disabling "Null Frame Delimiter" on the Graylog GELF TCP input, and setting framing.method: "newline_delimited" on vector, but I have the same issue.

jszwedko commented 1 year ago

Thanks for this report @MartinEmrich . I haven't tried to reproduce yet, but wanted to respond to:

A possible red flag that I might just be doing it wrong is that I had to massage the events quite a bit to have them accepted by the sink GELF encoding using a transform step: Encode the kubernetes Object to JSON string, set a (for gelf mandatory) host field, drop the not accepted timestamp_end field etc.

That massaging is expected since the GELF codec is strict with respect to fields.

MartinEmrich commented 1 year ago

That massaging is expected since the GELF codec is strict with respect to fields.

Thanks, good to know! (Part of me did expect it to just work by formatting every message GELF-compliant automatically). As soon as log delivery in itself works, i will then further refine my transform part...

neuronull commented 1 year ago

I was able to reproduce this locally.

using this compose file for graylog:

version: '3'

services:
  mongo:
    image: mongo:5.0.13
    networks:
      - graylog
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2
    environment:
      - http.host=0.0.0.0
      - transport.host=localhost
      - network.host=0.0.0.0
      - "ES_JAVA_OPTS=-Dlog4j2.formatMsgNoLookups=true -Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    deploy:
      resources:
        limits:
          memory: 1g
    networks:
      - graylog
  graylog:
    image: graylog/graylog:5.0
    environment:
      - GRAYLOG_PASSWORD_SECRET=somepasswordpepper
      - GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
      - GRAYLOG_HTTP_EXTERNAL_URI=http://127.0.0.1:9000/
    entrypoint: /usr/bin/tini -- wait-for-it elasticsearch:9200 --  /docker-entrypoint.sh
    networks:
      - graylog
    restart: always
    depends_on:
      - mongo
      - elasticsearch
    ports:
      - 9000:9000
      - 1514:1514
      - 1514:1514/udp
      - 12201:12201
      - 12201:12201/udp

networks:
  graylog:
    driver: bridge

, this guide from the graylog docs https://go2docs.graylog.org/5-0/downloading_and_installing_graylog/docker_installation.htm?tocpath=Downloading%20and%20Installing%20Graylog%7CInstalling%20Graylog%7C_____2#HowtoGetinLogData

, set up an input for GELF/TCP to 12201,

[api]
enabled = true

[sources.in]
type = "socket"
mode = "tcp"
address = "0.0.0.0:5555"
decoding.codec = "json"

[sinks.out]
type = "socket"
inputs = [ "in" ]
mode = "tcp"
address = "localhost:12201"

[sinks.out.encoding]
codec = "gelf"
except_fields = [ "timestamp_end", "file" ]
framing.method = "character_delimited"
framing.character_delimited.delimiter = "\u0000"

The following script:

while true; do
  echo '{ "version": "1.1", "host": "example.org", "short_message": "A short message that helps you identify what is going on", "full_message": "Backtrace her", "timestamp": 1385053862, "level": 1, "_user_id": 9001, "_some_info": "foo", "_some_env_var": "bar" }' | nc localhost 5555
done

Observed the influx of events with vector top , and the sink showed it was sending the events out as well.

In the graylog server logs:

docker_graylog-graylog-1        | 2023-07-24 20:58:57,014 ERROR: org.graylog2.plugin.inputs.transports.AbstractTcpTransport - Error in Input [GELF TCP/64bee17662bd612a09aa33af] (channel [id: 0x682b23f3, L:/172.18.0.4:12201 ! R:/172.18.0.1:39478]) (cause io.netty.handler.codec.TooLongFrameException: frame length exceeds 2097152: 2097186 - discarded)

, and then in vector's log:

2023-07-24T20:58:57.021657Z  WARN sink{component_kind="sink" component_id=out component_type=socket component_name=out}: vector::internal_events::tcp: Received EOF from the server, shutdown.
2023-07-24T20:58:57.021814Z ERROR sink{component_kind="sink" component_id=out component_type=socket component_name=out}: vector::internal_events::socket: Error sending data. error=ShutdownCheck::Close error_code="socket_send" error_type="writer_failed" stage="sending" mode=tcp internal_log_rate_limit=true
2023-07-24T20:58:57.021919Z ERROR sink{component_kind="sink" component_id=out component_type=socket component_name=out}: vector_common::internal_event::component_events_dropped: Events dropped intentional=false count=1 reason="Error sending data." internal_log_rate_limit=true
neuronull commented 1 year ago

Also, it appears others have run into this: https://github.com/vectordotdev/vector/issues/4868#issuecomment-1260756838

MartinEmrich commented 1 year ago

Tried some tcpdump just now. The delimiter on the wire is always a 0x0A, regardless of what I configure as a delimiter.

MartinEmrich commented 1 year ago

After doing lots of RTFS (and dipping my toe into Rust) I found out that part of it was a configuration mistake:

Look closely at my original config:

          mode: tcp
          encoding: 
            codec: gelf
            except_fields: 
            - timestamp_end
            - file
            framing:
              method: "character_delimited"
              character_delimited:
                delimiter: "\u0000"

The framing block is below the encoding subtree, that's what I found in all examples I found while googling. Apparently vector just silently ignores that.

After moving the framing block up one level, this works:

gelf:
          type: socket
          inputs: [gelftransform]
          address: "localhost:12201"
          mode: tcp
          encoding: 
            codec: gelf
            except_fields: 
            - timestamp_end
            - file
          framing:
            method: "character_delimited"
            character_delimited:
              delimiter: "\x00"
          healthcheck: false

And the other half is that apparently the Graylog GELF input always expects Null-Byte-Delimiters, that UI checkbox is apparently unused: https://github.com/Graylog2/graylog2-server/blob/master/graylog2-server/src/main/java/org/graylog2/inputs/gelf/tcp/GELFTCPInput.java#L43-L54

https://github.com/Graylog2/graylog2-server/issues/1240

So the Newline delimiter would never work, but seems to be the default here:

https://github.com/vectordotdev/vector/blob/master/src/codecs/encoding/config.rs#L113-L118

MartinEmrich commented 1 year ago

I tried to create a PR: https://github.com/vectordotdev/vector/pull/18419

With this, "framing" configuration can be omitted completely and the messages come up fine in Graylog.

jszwedko commented 1 year ago

Ooof, sorry about that @MartinEmrich ! Vector really should have complained about the unexpected fields under encoding. Thanks for the PR!