jhump / protoreflect

Reflection (Rich Descriptors) for Go Protocol Buffers
Apache License 2.0
1.35k stars 172 forks source link

invalid memory address or nil pointer dereference #609

Closed baganokodo2022 closed 6 months ago

baganokodo2022 commented 6 months ago

Hi team,

In one of our services, Datadog metrics are received in the form of Protobuf binary in HTTP1.1 request payloads. We parse the proto binary into a data model defined below, https://github.com/DataDog/agent-payload/blob/master/gogen/agent_payload.pb.go and the associated Protobuf schema, https://github.com/DataDog/agent-payload/blob/master/proto/metrics/agent_payload.proto

After that, the data objects are published to a Kafka topic via the Confluent Go SDK. For a good portion of the messages (not all of them), message publishing failed due to invalid memory address or nil pointer dereference.

Panic happened on this line, https://github.com/confluentinc/confluent-kafka-go/blob/f0d709f65d69d678d683f8ba324c8a913ac25dee/schemaregistry/serde/protobuf/protobuf.go#L246 And it is originated from the jhump line below, https://github.com/jhump/protoreflect/blob/df0ae2a8c0012d65a9e61f048b038fbaff891146/desc/protoprint/print.go#L512 Screenshot 2024-04-17 at 1 52 51 PM

When I printed out and compared the good and bad proto messages, I don't see any noticeable differences.

Need advice how to fix the problem.

many thanks Xinyu Liu

jhump commented 6 months ago

@baganokodo2022, that line does not de-reference any pointers. So it seems unlikely for that to be the right line. That screenshot doesn't really make sense -- it appears to contain lines from a stack trace, but in a random order (and I have no idea what to make of the timestamps).

Can you recover the actual panic output, the entire stack trace, and confirm the origin of the panic and post the whole stack trace (omitting any stack frames you think may be sensitive, regarding your codebase)?

baganokodo2022 commented 6 months ago

Hi @jhump ,

here is the stack trace on that panic,

{"level":"error","ts":1713393278.4177923,"caller":"server/server.go:166","message":"Failed to serialize proto message by confluent runtime error: invalid memory address or nil pointer dereference"} goroutine 4435 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x6b runtime/debug.PrintStack() /usr/local/go/src/runtime/debug/stack.go:16 +0x13 github.cbhq.net/data/datadog-shadow/internal/pkg/server.publish2Kafka.func1.1() /src/app/internal/pkg/server/server.go:167 +0xea panic({0x23ec340?, 0x391c5a0?}) /usr/local/go/src/runtime/panic.go:770 +0x136 google.golang.org/protobuf/internal/impl.(*MessageInfo).unmarshalPointer(0xc0001d4f68, {0x0, 0x100, 0x100}, {0xc00266a9c0}, 0x0, {0x0, {0x2c2afe8, 0xc001f1ca50}, 0x270f}) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/internal/impl/decode.go:104 +0x12d google.golang.org/protobuf/internal/impl.(*MessageInfo).unmarshal(0xc0001d4f68, {{}, {0x2c443b8, 0xc00266a9c0}, {0x0, 0x100, 0x100}, 0x0, {0x2c2afe8, 0xc001f1ca50}, ...}) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/internal/impl/decode.go:66 +0x1c5 google.golang.org/protobuf/proto.UnmarshalOptions.unmarshal({{}, 0x1, 0x1, 0x0, {0x2c2afe8, 0xc001f1ca50}, 0x2710}, {0x0, 0x100, 0x100}, ...) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/proto/decode.go:105 +0x29b google.golang.org/protobuf/proto.UnmarshalOptions.Unmarshal({{}, 0x0, 0x0, 0x0, {0x2c2afe8, 0xc001f1ca50}, 0x2710}, {0x0, 0x100, 0x100}, ...) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/proto/decode.go:65 +0x125 github.com/jhump/protoreflect/desc/protoprint.reparseUnknown(0xc001f1ca50, {0x2c443b8, 0xc005d384e0}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:512 +0x18a github.com/jhump/protoreflect/desc/protoprint.reparseUnknown.func1({0x2c49ee8, 0xc0001f68c0}, {{}, 0x2656560, 0xc005d384e0, 0x0}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:504 +0x296 google.golang.org/protobuf/internal/impl.(*messageState).Range(0xc005f9b680, 0xc003fdfd80) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/internal/impl/message_reflect_gen.go:49 +0x374 github.com/jhump/protoreflect/desc/protoprint.reparseUnknown(0xc001f1ca50, {0x2c443b8, 0xc005f9b680}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:484 +0xa6 github.com/jhump/protoreflect/desc/protoprint.reparseUnknown.func1({0x2c49ee8, 0xc0001f2100}, {{}, 0x259a8a0, 0xc001f1cdb0, 0x0}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:491 +0x328 google.golang.org/protobuf/internal/impl.(*messageState).Range(0xc005fa6900, 0xc005834ff0) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/internal/impl/message_reflect_gen.go:49 +0x374 github.com/jhump/protoreflect/desc/protoprint.reparseUnknown(0xc001f1ca50, {0x2c443b8, 0xc005fa6900}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:484 +0xa6 github.com/jhump/protoreflect/desc/protoprint.reparseUnknown.func1({0x2c49ee8, 0xc0001ec4e0}, {{}, 0x259a8a0, 0xc001f1cae0, 0x0}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:491 +0x328 google.golang.org/protobuf/internal/impl.(*messageState).Range(0xc005f8de00, 0xc005834610) /go/pkg/mod/google.golang.org/protobuf@v1.33.0/internal/impl/message_reflect_gen.go:49 +0x374 github.com/jhump/protoreflect/desc/protoprint.reparseUnknown(0xc001f1ca50, {0x2c443b8, 0xc005f8de00}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:484 +0xa6 github.com/jhump/protoreflect/desc/protoprint.(*Printer).printProto(0xc0064c0690, {0x2c3d1c0, 0xc00064d420}, {0x2c1dde0, 0xc004e262a0}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:323 +0x494 github.com/jhump/protoreflect/desc/protoprint.(*Printer).PrintProtoFile(0xc0064c0690, 0xc00064d420, {0x2c1dde0, 0xc004e262a0}) /go/pkg/mod/github.com/jhump/protoreflect@v1.15.3/desc/protoprint/print.go:279 +0x48 github.com/confluentinc/confluent-kafka-go/v2/schemaregistry/serde/protobuf.(*Serializer).toDependencies(0xc00052ba40, 0xc00064d420, 0xc001f1c9c0) /go/pkg/mod/github.com/confluentinc/confluent-kafka-go/v2@v2.3.0/schemaregistry/serde/protobuf/protobuf.go:213 +0x108 github.com/confluentinc/confluent-kafka-go/v2/schemaregistry/serde/protobuf.(*Serializer).toProtobufSchema(0xc00052ba40, {0x2c1d760, 0xc003ae7140}) /go/pkg/mod/github.com/confluentinc/confluent-kafka-go/v2@v2.3.0/schemaregistry/serde/protobuf/protobuf.go:203 +0x1a8 github.com/confluentinc/confluent-kafka-go/v2/schemaregistry/serde/protobuf.(*Serializer).Serialize(0xc00052ba40, {0x26e41b3, 0x1c}, {0x26118c0, 0xc003ae7140}) /go/pkg/mod/github.com/confluentinc/confluent-kafka-go/v2@v2.3.0/schemaregistry/serde/protobuf/protobuf.go:167 +0x195

It is raised from confluent-kafka-go/v2@v2.3.0 the most current release.

Many thanks Xinyu

baganokodo2022 commented 6 months ago

FYI, here is the protoc-generated go source,

agent_payload.pb.txt

jhump commented 6 months ago

From that full stack trace, at line 509 (here), the protobuf-go runtime is returning an invalid slice. The {0x0, 0x100, 0x100} you see on all of the stack frames from that last protoprint frame up to the panic is a slice with a nil pointer but a non-zero length and capacity (of 256 bytes). Slices with non-zero length and capacity should never have invalid or nil pointers. So when the unmarshal code goes to read the slice (the actual stack frame causing the panic here), it sees there are supposedly 256 bytes, but then de-references a nil pointer trying to query for the first byte.

I have no way of knowing where that is coming from unless could provide a small repro case that I could debug. It seems like it is likely something in the protobuf-go runtime since that is where the implementations of protoreflect.Message are defined (which is what the code is using, to query for a slice of unrecognized data).

If you could extract/recreate the file descriptor that the code is passing to protoprint and that reproduces the issue, that would help a lot.

At this point, it seems like some form of garbage is being passed in to the protoprint (a field descriptor where one of the underlying descriptor proto messages has a corrupted/invalid slice header), and I have no idea where that bad data is coming from or if it's caused by something else in this module. It could be a bug in the protobuf-go module or it could be a bug in your own code that constructs the file descriptor to print 🤷.

baganokodo2022 commented 6 months ago

thank you so much @jhump

I figured out the root cause, it is the imported dependency in the protobuf file, import "github.com/gogo/protobuf/gogoproto/gogo.proto"; in fact, the above project is marked as deprecated. after I commented it out, the nil pointer dereference error is gone.