open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.33k stars 1.44k forks source link

[Receiver] Panic on calling `LogRecordCount()` in receiver #10625

Open grandwizard28 opened 2 months ago

grandwizard28 commented 2 months ago

Describe the bug On calling LogRecordCount() after Consume in a receiver, the receiver seems to panic and throw a nil pointer error. A redacted version of the receiver looks like this:

...
logs, err := receiver.parser.Parse(body)
if err != nil {
    writeError(w, err, http.StatusBadRequest)
    return
}

// At this point, the receiver has accepted the payload
ctx := receiver.obsreport.StartLogsOp(req.Context())
err = receiver.nextConsumer.ConsumeLogs(ctx, logs)
receiver.obsreport.EndLogsOp(ctx, metadata.Type.String(), logs.LogRecordCount(), err)

if err != nil {
    writeError(w, err, http.StatusInternalServerError)
    return
}
...

The redacted stack trace looks like this:

go.opentelemetry.io/collector/pdata/plog.Logs.LogRecordCount({0xc0018347c8?, 0xc00288a8fc?})
/home/runner/go/pkg/mod/go.opentelemetry.io/collector/pdata@v1.10.0/plog/generated_resourcelogsslice.go:56
go.opentelemetry.io/collector/pdata/plog.ResourceLogsSlice.At(...)
/opt/hostedtoolcache/go/1.22.5/x64/src/runtime/panic.go:770 +0x132
panic({0x1792160?, 0x2d2a0f0?})
/opt/hostedtoolcache/go/1.22.5/x64/src/net/http/server.go:1903 +0xbe
net/http.(*conn).serve.func1()
goroutine 181262 [running]:
2024/07/16 15:52:25 http: panic serving 10.52.8.48:57874: runtime error: invalid memory address or nil pointer dereference

Steps to reproduce The error is happening at every throughput. [Close to 60K log records]

What did you expect to see? Panic not to happen

What did you see instead? Panic

What version did you use?

go.opentelemetry.io/collector v0.103.0
go.opentelemetry.io/collector/component v0.103.0
go.opentelemetry.io/collector/pdata v1.10.0

Additional context https://github.com/open-telemetry/opentelemetry-collector/pull/10402

atoulme commented 2 months ago

What's your pipeline looking like, as in, what processors and exporters did you use after this receiver?

grandwizard28 commented 2 months ago

The logs pipeline looks like this:

    logs:
      receivers: [otlp, customreceiver]
      processors: [batch]
      exporters: [kafkaexporter]
grandwizard28 commented 2 months ago

I have a hunch that doing the below:

logs, err := receiver.parser.Parse(body)
if err != nil {
  writeError(w, err, http.StatusBadRequest)
  return
}
numLogs := logs.LogRecordCount()

// At this point, the receiver has accepted the payload
ctx := receiver.obsreport.StartLogsOp(req.Context())
err = receiver.nextConsumer.ConsumeLogs(ctx, logs)
receiver.obsreport.EndLogsOp(ctx, metadata.Type.String(), numLogs, err)

will fix the issue. This is basically how it's being done in the otlpreceiver.

atoulme commented 2 months ago

That will fix your issue, for sure.

grandwizard28 commented 2 months ago

This fixed it @atoulme. Do you think we can make a point somewhere in the documentation for this?

crobert-1 commented 2 months ago

Looks like a relatively common issue, as shown in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/29274. It would be good to document this somewhere.