protocolbuffers / protocolbuffers.github.io

Other
38 stars 107 forks source link

clarification for encoding of maps as repeated messages (not a bug) #164

Closed ah-quant closed 4 weeks ago

ah-quant commented 2 months ago

According to https://protobuf.dev/programming-guides/encoding/#maps, the binary encoding of maps are repeated messages with two optional fields.

I'd have understood required. Or no qualifier at all. But this makes me curious. Why optional - and what are the semantics for missing keys or values?

I don't need the docs changed, a comment here is more than enough. Thanks!

Logofile commented 1 month ago

Proto2 requires that a field have a field label of required, optional, or repeated. Proto3 allows fields with no field label. required is no longer recommended in proto schema (https://protobuf.dev/programming-guides/proto2/#field-labels).

I'll check with the eng team to find out what happens when there is a missing key or value.

esrauchg commented 4 weeks ago

This is getting a bit into weakly-documented nuances that implementations may slightly differ, but today the 'message' representation differs between proto2 and proto3 maps.

Proto3 maps have wire treatment equivalent to if you had the key-value message defined using proto3 "no qualifier at all" behavior, as listed here: https://protobuf.dev/programming-guides/proto3/#backwards

Contrasted with proto2 which has behavior of the key-value being marked 'optional' (as Logofile mentioned, "no qualifier at all" isn't legal syntax is proto2), as listed here: https://protobuf.dev/programming-guides/proto2/#backwards

Either case it is legal for an entry to have either keys or values to be absent and it will be interpreted as though the default value for that type was present on the wire. Specifically, either way if you had a map<int,string> and got an entry with neither a key or value set inside it, it will be seen as a (0, "") entry. If you're writing your own protobuf implementation, you should accept key/value being absent on the wire and handle it as the default value for the corresponding type.

In general this proto2/proto3 difference behavior only has extremely obscure observable differences: in general the Map<> APIs in the Google supported implementations do not expose the presence of the individual key/value regardless, but the syntax-dependent nuance can be observed if you examine the descriptors directly, in some cases via reflection APIs, and in whether a serializer chooses to skip the 0 value when writing out to the wire or not.