protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.11k stars 15.43k forks source link

Version 3.1(release) - Incompatible serialization: C# -> JSON -> C++ #2278

Closed zhenkas closed 7 years ago

zhenkas commented 7 years ago

C# default JSON name is camel style and set by ToCamelCase() function, however C++ default JSON name is set by ToJsonName().

This broke the compatibility between C# and C++ JSON serialization. If field in proto is not in Camel style (for example: "KVStore"), then there will be an error in C++ deserialize.

jskeet commented 7 years ago

The C# formatter already uses the json_name field from the proto descriptor if it's present. Can you give a short but complete example including the expected and actual output?

zhenkas commented 7 years ago

The problem occurs when json name is not specified and it uses "default json name". Try to serialize proto struct with field named "KVStore". In c# it will be converted to camel style "kvStore" while in c++ it will stay with right case.

jskeet commented 7 years ago

I was under the impression that the json_name field was always filled in by protoc, even if it wasn't specified explicitly. (I think that was the intention.)

It sounds like there may be two issues here:

Do you know what other languages are doing with this? (That may affect which codebase needs to change to minimize breakage...)

zhenkas commented 7 years ago

Nice catch, I did not think about protoc at all!

I agree that json_ name always should be filled in by protoc. Because, once the code was generated, nothing should affect its serialization rules.

Fixing the first issue will "eliminate" the second issue for all languages.

xfxyjwf commented 7 years ago

Every JSON implementation must implement its own snake_case to/from camelCase conversion code for FieldMask, so having protoc always populate json_name doesn't solve all the problem. Though I do have a pending change that does it but there are hundreds of failing tests to fix and I don't think it's worthwhile to pursue for no benefit.

I have updated the snake_case to camelCase conversion code to be consistent in C++ and Java. Other languages will need to copy the same conversion code as in: https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.cc#L201 https://github.com/google/protobuf/blob/master/java/core/src/main/java/com/google/protobuf/Descriptors.java#L1217

jskeet commented 7 years ago

@xfxyjwf: Okay, that's easy code to convert, and much simpler than the previous ToCamelCase code... but could you clarify under what circumstances that should be used vs the previous ToCamelCase code? For field masks as well as JSON field names? If so, I suspect I can get rid of the more complex code in C# entirely. Will create a PR and see what you think...

xfxyjwf commented 7 years ago

@jskeet FieldMask names should follow the same conversion rule as ToJsonName().

It's more complex than that though: converting a field path to JSON and back again may not yield the original field path if the field name doesn't follow snake_case strictly, so there needs additional check to make sure the conversion can actually be round-tripped. What I think is the right behavior:

  1. FieldMask JSON conversion follows the same rule as ToJsonName().
  2. FieldMask JSON conversion should fail if the conversion can not round-trip. For example: snake_case will be converted to snakeCase and then back to snakecase. Since it can't round-trip, the conversion from snakecase to JSON should fail on FieldMask.

The status-quo:

  1. FieldMask JSON conversion follows the same rule as ToJsonName().
  2. If the conversion can not round-trip, different implementations have different behaviors.

In google3, we have strict rules that field names must be snake_case, so (2) doesn't actually matter there. In the opensource world, people use different style of field names. No matter what conversion rule we adopt, non-style-conforming names can not be used in FieldMask (fail the conversion immediately or can't be converted back), so I guess (2) does not matter that much either.

jskeet commented 7 years ago

@xfxyjwf: The C# code already checks for invalid snake case too: https://github.com/google/protobuf/blob/master/csharp/src/Google.Protobuf/WellKnownTypes/FieldMaskPartial.cs#L88

xfxyjwf commented 7 years ago

@jskeet Oh, that's good to know. I think you can just replace the old ToCameCase() method with the simpler ToJsonName() everywhere. It should not make a difference for valid snake_case names.

jskeet commented 7 years ago

ToCamelCase has been deleted now anyway :)