Closed MarcoPolo closed 1 year ago
In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.
Isn't that potentially a problem? If I explicitly set a field, and the value just happens to coincide with the default value, I'd want it to be serialized, no?
EDIT: I might be misunderstanding, does this only apply to non-optional fields? That would make sense then, since if a field is non-optional (= required), there's no way it could have been left empty.
does this only apply to non-optional fields?
Yes - more correctly it applies to singular
fields, the default type of field - see the Specifying Field Rules of the proto3 spec.
On deserialization to their object form both singular
and optional
fields are set to their default values if no value was present on the wire.
On serialization optional
fields write any value that was set even if it was the default (and no value if one was not set), singular
fields only write out a value if the value set was not the default.
Consequently optional
fields let you know if the field was set, singular
fields do not.
On serialization
optional
fields write any value that was set even if it was the default (and no value if one was not set),singular
fields only write out a value if the value set was not the default.
yup: here are some example tests: https://github.com/MarcoPolo/proto2and3-playground/blob/main/src/main.rs#L122. proto3 optional bytes behave exactly like proto2 optional bytes.
In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.
Isn't that potentially a problem? If I explicitly set a field, and the value just happens to coincide with the default value, I'd want it to be serialized, no?
EDIT: I might be misunderstanding, does this only apply to non-optional fields? That would make sense then, since if a field is non-optional (= required), there's no way it could have been left empty.
A couple of things that may help clarify (I'm talking about proto3 here):
optional
modifer.optional
modifer are not required. This is a little confusing, so I'll elaborate. Proto2 had a notion of "required" fields. This meant that if this field wasn't present, decoding would fail. This is not the case with Proto3. Proto3 doesn't have a notion of "required" fields that will cause decoding to fail. This is on purpose and a good thing. If a non optional
field is missing in the message, proto3 assumes it's the default value for that type. So for a byte array the default value would be an empty byte array or a string would be an empty string. For a nested message we do "explicit presence" regardless if the field has an optional
modifier (see the table above). if a field is non-optional (= required), there's no way it could have been left empty.
With the above, hopefully it's clear there is no "required" notion in proto3. A field that is not marked optional
could have been left empty. This is usually fine, but if your program depends on knowing if the field was set vs unset it should mark the field as optional
in the protobuf.
Not failing decode on missing required fields is not a good thing, now you need extra logic in the client for this and a giant footgun.
Not failing decode on missing required fields is not a good thing, now you need extra logic in the client for this and a giant footgun.
You probably already have some logic in the client side checking if the value is even appropriate. Having required fields makes backwards/forwards compatibility hard because a required field is forever. This isn't just my opinion, there's a lot written about this:
Required Is Forever You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead.
A second issue with required fields appears when someone adds a value to an enum. In this case, the unrecognized enum value is treated as if it were missing, which also causes the required value check to fail.
from: https://developers.google.com/protocol-buffers/docs/proto#specifying-rules
And more:
Thanks @MarcoPolo for the research and the elaborate description.
One thing to note is that proto3 only supports optional
since protoc
v3.15.0
. Users may use an older protoc
version. E.g. Debian bullseye ships with v3.12.4
. I don't think we should give this much weight. In other words, I don't think this is an argument against proto3. I am still raising it here so we can help users that run into it. (The error message with protoc
<v3.15.0
does not make this obvious.)
I am fine with libp2p moving to proto3.
I am sorry for being the source of the confusion on presence in proto2 and proto3.
I am sorry for being the source of the confusion on presence in proto2 and proto3.
No need to be sorry. This is not easy to reason about.
I don't think we should pay a lot of attention to what Debian does. I don't see any good justification for their focus on "stability" (aka outdated software). For example, they ship with Go 1.15, which was released in Aug 2020 and has been unmaintained for more than a year now.
I'm wondering if we should move all of our existing protobufs to proto3 as well. It would be nice to be consistent across our entire stack, and we could get rid of proto2 compiler dependencies. We'd have to check that this can be done in a backwards-compatible way in all our protocols though.
This is not easy to reason about.
It certainly is not. I discovered the other day that the official protobuf.js module doesn't handle default values properly when deserializing "singular" fields so even Google don't get it right sometimes and it's their spec.
I'm wondering if we should move all of our existing protobufs to proto3 as well
I've been doing this with the js stack as we took the decision to only support proto3 in protons and it's mostly been ok.
One oddity is when a proto2 field has been marked as required
, and you are sending a message to a peer that will use a proto2 decoder it needs a value on the wire. The only way to ensure this happens in proto3 is to mark the field optional
as if it's singular
the value will be omitted if it's the default value.
🤪
Is it correct to think about it in the following way:
proto3
has a default value. Unset fields in message will show up as their default value upon deserialization.optional
fields have "None" as a default value (like Rust's Option
or Haskell's Maybe
).Every type in proto3 has a default value. Unset fields in message will show up as their default value upon deserialization.
Yes, but with one gotcha that the default value for Message fields is for them to be unset, the exact value of which is language-dependent.
optional fields have "None" as a default value (like Rust's Option or Haskell's Maybe).
According to the spec all fields should be set to their default value upon deserialization (even optional
fields).
If the field is marked optional
you should be able to check if it was explicitly set - how you do that varies by language and even by protobuf implementation within the language.
If the field is singular
(the default) you cannot check if it was explicitly set.
There seems to have been a misunderstanding in the past around proto2 vs proto3. My attempt here is to clear up the confusion, recommend proto3 in general, and explain why proto3 should be preferred.
Our main confusion is about field presence. That is, if a field is omitted from the serialized wire format does the user of the decoded message know the difference between if the field was unset or set as the default value. This document has a lot of good information and is worth the read: https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md
Origins of the confusion
Proto2 would always serialize an explicitly set field, even if it was set to the default. This meant that you could know on the decoding side whether the field was set or not. This is called Explicit Presence. For example, in the Rust protobuf compiler, it would wrap these in
Option<T>
: https://github.com/tokio-rs/prost#field-modifiers.The confusing thing is that the language guide for proto2 states:
The subtlety here is that this doesn't say anything about "hasField" accessors. Which may be provided by the implementation to check if the field was set or not. This is essentially with prost is doing with
Option<T>
types.Another confusing thing is that this language guide doesn't mention "presence" a single time. Which is what we're talking about here.
In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.
Field Presence Proto2 vs Proto3
To clarify field presence in proto2 vs proto3:
From https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-proto2-apis
Proto2
Proto3
optional
Advantages in Proto3 compared to Proto2
required
modifierNext steps
README.md