streem / pbandk

Kotlin Code Generator and Runtime for Protocol Buffers
MIT License
267 stars 37 forks source link

Unexpected 0-th value enum JSON serialization #235

Closed antongrbin closed 1 year ago

antongrbin commented 1 year ago

The testing message is defined as follows:

syntax = "proto2";

message MessageWithEnum {
    enum EnumType {
      FOO = 0;
      BAR = 1;
    }
    optional EnumType value = 1;
}

We observe the following serialization with pbandk:

Source message current JSON serialization expected JSON serialization
MessageWithEnum() {"value":null} {}
MessageWithEnum(value=FOO) {} {"value":"FOO"}
MessageWithEnum(value=BAR) {"value":"BAR"} OK!

The bug is reproduced in a unit test here: https://github.com/streem/pbandk/pull/234

I know that proto2 doesn't really have a JSON spec defined, but other protobuf implementations (java, python, swift) would all behave as expected in this example.

In proto3, we observed the following serialization behavior:

Source message current JSON serialization expected JSON serialization
TestAllTypesProto3() {} OK!
TestAllTypesProto3(optionalNestedEnum = FOO) {} {"value":"FOO"}
TestAllTypesProto3(optionalNestedEnum = BAR) {"value":"BAR"} OK!

1) Do you agree that the current behavior is unexpected? 2) Would you be open for us contributing a fix for it?

Thank you for your time :)

garyp commented 1 year ago

The proto2 behavior definitely seems like a bug. As you mentioned, proto2 doesn't officially have a JSON spec defined and so pbandk doesn't officially support JSON with proto2. But we do try to match the behavior of the official protobuf libraries where possible. We would gladly accept a PR with a fix.

The proto3 behavior I would argue is compliant with the spec. According to https://developers.google.com/protocol-buffers/docs/proto3#json:

If a field has the default value in the protocol buffer, it will be omitted in the JSON-encoded data by default to save space. An implementation may provide options to emit fields with default values in the JSON-encoded output.

pbandk does have the JsonConfig.outputDefaultValues option available if you want the output to be {"value":"FOO"}.

antongrbin commented 1 year ago

Thank you for the quick turnaround time :)

Happy to hear that we can move forward with the fix for proto2.

If you agree, please review https://github.com/streem/pbandk/pull/234 which only adds unit tests that show the current behavior. This will make our follow-up fix PR clearer.

--

I looked into implementing the fix for proto2 and it looks like most effort will go into determining whether the message is proto2 or proto3 with the information that JsonMessageEncoder has in runtime. Did I miss anything here?

A simpler approach is to say that field presence is what's important, not proto2 vs proto3. In that approach we would change the behavior of all optional enum fields with default values, regardless of whether they are in proto2 or proto3.

Digging deeper, I came across Application note: Field presence which suggests that our fix here might not be enum-specific. From that doc:

* No presence discipline:
  * Default values are not serialized.
* Explicit presence discipline:
  * Explicitly set values are always serialized, including default values.

This suggests that we should serialize default values of all optional fields (not only enums).

--

Having all of the above, we can apply the fix with 3 scopes: 1) fix serialization of default values for proto2 enums 2) fix serialization of default values for all optional enums 3) fix serialization of default values for all optional fields

My hunch would be to move forward with 3) since it makes the code cleaner and is more aligned with other implementations. I can provide more details here if needed :)

Looking forward to your input!

garyp commented 1 year ago

fix serialization of default values for all optional fields

I agree that this would be the preferred route.

In general, it's better to avoid writing code that makes decisions based on proto2 vs proto3 (when possible) and instead write code that makes decisions based on the field's presence discipline (as described in the doc you linked to).

This suggests that we should serialize default values of all optional fields (not only enums).

Yes agreed. Just keep in mind the distinction between "no presence" (the default behavior for non-repeated proto3 fields) and "explicit presence" (used in proto2 and proto3 for fields that include the optional keyword).

antongrbin commented 1 year ago

Excellent. Thank you for the guidance!

@garyp will you be the reviewer for the changes? This will be 2 small PRs, first one with a unit test and second one with a fix.

The first one is here: https://github.com/streem/pbandk/pull/234

garyp commented 1 year ago

Yep, I'll review them. I should have time later today to take a look at the PRs.

antongrbin commented 1 year ago

Thank you for the review on the unit test PR.

The fix PR is here: https://github.com/streem/pbandk/pull/238

garyp commented 1 year ago

@antongrbin Thank you for the thorough fix and tests!

antongrbin commented 1 year ago

@garyp is there anything blocking the 0.14.2 release? Milestone looks all closed: https://github.com/streem/pbandk/milestone/12

Thank you for your time :)

garyp commented 1 year ago

@antongrbin It's released now!