open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.73k stars 2.16k forks source link

[receiver/syslog] Syslog Receiver fails to parse long messages, even with a `max_log_size` set #33182

Open sinkingpoint opened 1 month ago

sinkingpoint commented 1 month ago

Component(s)

receiver/syslog

What happened?

Description

When using the syslog receiver, we can only parse messages up to the default maximum length (8192 octets), even with a max_log_size set much higher.

Steps to Reproduce

  1. Create a receiver with the provided config (note the max_log_size of 100MiB)
  2. Send a message in longer than 8192 characters
  3. Observe an error: message too long to parse. was size 40366, max length 8192

Expected Result

The message should parse properly

Actual Result

The message fails to parse

Collector version

v0.100.0

Environment information

Environment

OS: Debian Bookworm

OpenTelemetry Collector configuration

receivers:
  syslog:
    protocol: rfc5424
    enable_octet_counting: true
    tcp:
      listen_address: :4278
      max_log_size: 100000000 # 100MiB
exporters:
  debug:
service:
  pipelines:
    logs:
      receivers: [syslog]
      exporters: [debug]

Log output

{"level":"error","ts":1716356887.9432147,"caller":"helper/transformer.go:101","msg":"Failed to process entry","kind":"receiver","name":"syslog/db","data_type":"logs","operator_id":"syslog_input_internal_parser","operator_type ":"syslog_parser","error":"message too long to parse. was size 40366, max length 8192","action":"send","stacktrace":"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*TransformerOperator).HandleEntryError\\n\\tgithub.co m/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/helper/transformer.go:101\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ParseWith\\n\\tgithub.com/open-telemetry/opentelemetry-collect or-contrib/pkg/stanza@v0.99.0/operator/helper/parser.go:140\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*ParserOperator).ProcessWithCallback\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/ope rator/helper/parser.go:112\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/syslog.(*Parser).Process\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/parser/syslog/parser.go:54\\ngithub.com/ open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper.(*WriterOperator).Write\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/helper/writer.go:53\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/p kg/stanza/operator/input/tcp.(*Input).handleMessage\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/input/tcp/input.go:191\\ngithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/tcp.(*Input).goHan dleMessages.func1\\n\\tgithub.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza@v0.99.0/operator/input/tcp/input.go:152"}

Additional context

This seems to be because we aren't parsing a value into here: https://github.com/influxdata/go-syslog/blob/66067a10754ae90b9540d5312989ae685413c4fe/octetcounting/parser.go#L46 so we get stuck with the default limit

github-actions[bot] commented 1 month ago

Pinging code owners:

frzifus commented 1 month ago

As far as I understand that part, the parser has no option the pass the maxSize information to to any parser?

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/902d846079474a316334ddb2a37ffaa84c3c5462/pkg/stanza/operator/parser/syslog/parser.go#L29-L36

Looking at this construction part non of those takes a maxsize into account.

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/902d846079474a316334ddb2a37ffaa84c3c5462/pkg/stanza/operator/parser/syslog/parser.go#L88-L113

The used version of github.com/influxdata/go-syslog/v3/rfc5424 doesnt even offer an option that can be set.

djaglowski commented 1 month ago

max_log_size is a feature of the TCP input component, but it doesn't apply to syslog.

The used version of github.com/influxdata/go-syslog/v3/rfc5424 doesnt even offer an option that can be set.

I looked into this further and found that go-syslog justifies the hard limit based on RFC 5425 Section 4.3.1. My reading of that section is that it is the minimum which the library should support but it is not prescriptive about it being a maximum.

sinkingpoint commented 1 month ago

@djaglowski considering that that repo has been archived, would it make sense to fork it here?

djaglowski commented 1 month ago

Actually I'm happy to see that the original author has recently created a fork and is making updates again! We should definitely switch in my opinion. https://github.com/leodido/go-syslog.

andrzej-stencel commented 1 month ago

If I'm reading this correctly, the v4 release from leodido/go-syslog allows us to fix this issue, as it contains the WithMaxMessageLength function introduced in https://github.com/influxdata/go-syslog/pull/39 that we can call when instatiating the parser. Is my thinking correct?

andrzej-stencel commented 1 month ago

PR switching the dependency to the fork: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33205.

bacherfl commented 1 week ago

Hi! I would like to pick this issue up if still available

bacherfl commented 1 week ago

@djaglowski I went ahead and created a draft PR making use of the new option the updated library. I do have some open questions which I have added to the PR description - appreciate any feedback there