Open zellyn opened 5 years ago
The one thing this issue fails to address is how this works in a language neutral way. If two language runtimes fail to make the same default json field name because of a quirk with one runtime then they can't work together. Sure that might be a bug with the quirked runtime, but the whole issue could just be avoided if the runtime was given the same json name as the other runtimes via protoc.
On top of that, calculating a json name takes time. It may not be a long time, but it is time and memory. It's not time and memory to calculate it ahead of time and reference it as a constant value (depending on the runtime).
The last point doesn't seem like much of an issue. You should be distributing .proto
files to anyone that needs them. The only people I know that can't get .proto
files distributed to them are people writing Steam bots and Pokemon Go bots. Feel free to list any legitimate reasons people should be able to more easily reverse engineer a .proto
file from a descriptor.
The biggest complaint is that it makes it impossible to tell whether a user specified a json_name
or not. I suppose it would be possible to add an additional bool explicit_json_name
field or something.
We have a lot of code still on protobuf 3.0.0, and when we try to upgrade to a newer protobuf release, all the json field names change. We can use the OrigName
option in jsonpb
, but that will completely disallow any fieldname customization. What we really need is a PreferOrigName
option that still allows explicit field renaming. But the fact that protoc
is synthesizing stuff into the descriptor precludes the possibility of implementing such an option, either in upstream golang/protobuf or even in our own custom serialization code.
As far as the time burden of calculating json names… in compiled runtimes that often happens at generation time (it does in Go, currently), and even in interpreted languages, it's surely highly cacheable. Besides, they all have to have the code/machinery to be able to handle descriptors that lack json_name
anyway, so it's already there.
It's true that recreating a .proto
file from the descriptor is uncommon (although they do/can include all kinds of comment and line number information in the SourceCodeInfo
message, which seems built for that purpose); I guess I was trying to articulate the fact that it “feels wrong” to introduce extra synthetic information into the descriptor.
https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/command_line_interface.cc#L1972 I think we can have a command line option to decide whether to include json name in the output. @acozzette how do you think?
Yeah, I think a commandline option is perfect. It should always reflect the json_name
if it was specified in the .proto
file, but the commandline option could turn off the synthesis of googleCase defaults.
Any updates on this? Would sending a PR help?
Sorry this slipped off my radar. Usually we try to avoid new options but maybe it is justified in this case. I'm guessing if we just removed json_name
unconditionally then things might break. @haon4 What do you think?
I don't think we should change the current behavior as users are already relying on it. I am also not a fan of adding more command line option, as this doesn't seem like a common feature that other people are asking for, and I really don't like to add more switch for one use-case.
Can this problem be solved at the application level rather than at the protobuf level? Does it really matter if the json_name was explicitly specified or not? If you make the lowerCamelCase transformation on the field name and that matches the json_name option, can you assume that the json_name is not specified? Even if the json_name is explicitly specified in that case, does it matter?
I thought I described why earlier, but no, it is not solvable at an application level, because the way protoc currently populates json_name
for all fields makes it impossible to tell whether any given field actually had a json_name
property.
A commandline flag would be perfect.
btw, if y'all don't add a commandline option, we will be forced to fork protoc internally, and add one 😞 Either that or specify a json_name
on every single existing field that includes underscores (also 😞). I'm happy to rough together a PR adding a commandline option, if that would help 🙂.
Fair enough. If you can put together a PR for this command line option, it may help. Point taken that this can reduce the size bloat on the serialized descriptor, and most languages should already handle the fallback in case json_name
is not there, so I think that may be something we would want to enable by default in the long term. I see there are still many places that are making the assumption about json_name
being there though, but having the command line option may enable us to slowly knock them out.
I think removing json_name
from returned FileDescriptorSets
is unfortunately a breaking change at this point - external plugins could totally be relying on that to remove the need to compute json_name
themselves. @jhump
Another option would be to add a field to google.protobuf.FieldDescriptorProto
that is something like i.e. optional bool json_name_explicitly_set = 11;
that protoc
could populate, and plugins could make decisions based on this. If this was done, the builtin plugins should handle this field as well re: #6175. This seems dirty, but may actually be cleaner than a protoc
command-line flag, and it is kind of how Protobuf is intended to be used (you can't change the json_name
field, but you can iterate on your message definition by adding new fields).
Yea, I think having something like optional bool json_name_explicitly_set = 11;
would be quite nice, actually. I would be happy with that.
On second thought, I think it has to be the negative for backwards compatibility reasons, unless I'm confusing myself, ie optional bool json_name_NOT_explicitly_set = 11;
. If you don't have this, then you can't tell the difference between explicitly_set == false
being because of using an old version of protoc
, or because it was actually not set. I might be confusing myself though in double-negatives.
The work as I see it:
descriptor.proto
, preferably before the 3.8.0 release but I doubt that's possible at this point.FileDescriptorSets
.FileDescriptorSet
to the builtin plugins when compiling Protobuf files directory as is passed when running --descriptor_set_in
(the former does not send a set json_name
to the builtin plugins)Note that for backwards compatibility, when running a plugin (both the builtins and external plugins), you need to do the following with this (in pseudo-code):
if get(jsonNameField) != "" {
// if this is false, it could be due to not being set at all with older versions of protoc,
// in which case we have to act as if the jsonName was explicitly set
if get(jsonNameNotExplicitlySetField) {
actAsIfNotSet()
} else {
actAsIfSet()
}
}
I think it has to be the negative for backwards compatibility reasons
Remember that descriptor.proto
is proto2 syntax. So it could just use a default value:
optional bool json_name_configured = 11 [default=true];
Good point. One issue though is that makes the definition a tad weird - even if no FileDescriptorSet provider (ie protoc et al) would produce this, what does it mean when json_name == “” && json_name_configured == true?
string foo = 1 [json_name=""];
:P
(Related: https://github.com/protocolbuffers/protobuf/issues/5682)
Lol
I'm fine with json_name_explicitly_set
with a default of false. If you're in this situation, you can probably just start using a new version of protoc
. Anything I can help do to push this forward?
I think it needs to be json_name_not_explicitly_set, this does matter for being able to really reason about the value of the field, see above
@elharo Do you know if someone is still working on this? Can we expect this config to be part of future releases?
I am willing to contribute if this is not being worked on.
I don't think anyone is working on this. However, reading through this now I confess I don't see a use case for this feature. That is, why does this matter? Until I can achieve clarity on that I don't think I'd be likely to approve any such PR. Perhaps someone else sees something here I don't.
My thinking (not dispositive) is as follows:
I'm not sure about the "fully" in step 1. There are command line options that change this, but that doesn't change my conclusion. This feature simply doesn't seem to be needed, likely encourages bad code and non-interoperable protos, and is unlikely to be worth the added complexity in code.
Metaphorically it's like trying to change the byte code emitted by javac so that we can tell whether it was indented with spaces or tabs.
Metaphorically it's like trying to change the byte code emitted by javac so that we can tell whether it was indented with spaces or tabs.
@elharo, I think the fact that an artifact of this is that protoc
itself produces different code based on whether the input is the original source vs. a file descriptor set means this is also a consistency issue. And that's a problem because it means that strategies used by build tools could result in different (non-deterministic) output. That case is described more thoroughly in #6175 (which was closed as a duplicate of this issue).
We have built parsers that can read a file descriptor set and re-create the original .proto
schema. This feature enables users to auto-discover schemas during stream processing and register them in the schema registry. Since the file descriptor set generation is different for the same .proto schema, it results in different schemas getting generated. This results in inconsistencies and non-deterministic behavior as called out by @jhump.
The Protobuf code base also acknowledges it as a bug and a work around was introduced to deal with it.
Since this change will be non-backward compatible, I recommend that we add the boolean flag json_name_explicitly_set
(or a better name) as suggested previously.
Reference
@blacktooth "Since the file descriptor set generation is different for the same .proto schema, it results in different schemas getting generated." What exactly do you mean by different here? Are some fields present in one and not the other? Are the fields renumbered? Are the fields reordered? Is the white space changed? Are comments added or removed?
I don't expect the results would be byte-per-byte identical, and some of these changes are more serious than others.
For future reference, relevant code seems to be in protobuf/compiler/command_line_interface.cc void CommandLineInterface::GetTransitiveDependencies
Any progress on this?
I don't think anyone is working on this right now
Sad. #6175 was closed as a dup of this one, but summarizes the problem pretty well:
protoc: code gen is different between using source files and using descriptor set
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive
because the last activity was over 90 days ago.
/keepopen
Coming back to this after a long time. Reading the comments here and on related issues, it seems there is a loose consensus among external tools that if the json_name
in the protobuf descriptor matches the auto-generated one, then the user/.proto
-file probably didn't actually specify a json_name
explicitly.
What if we simply codified that?
Here's my proposal:
If a field in the .proto
file explicitly specifies a json_name
, and that name is the same as the one that would have been auto-generated for the field, then protoc
should add the json_name_explicitly_set
attribute to that field.
This would allow existing tooling to be mostly unchanged, and would provide a solution in the rare cases when the heuristic adopted by external tooling goes wrong: re-generate your proto descriptor with a more recent version of protoc
, and check for json_name_explicitly_set
.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive
because the last activity was over 90 days ago. This issue will be closed and archived after 14 additional days without activity.
/keepopen - mind above proposal
What version of protobuf and what language are you using? Version: v3.6.0 Language: any
What operating system (Linux, Windows, ...) and version? n/a
What runtime / compiler are you using (e.g., python version or gcc version) n/a
What did you do? Steps to reproduce the behavior:
test.proto:
Command:
Output:
What did you expect to see It should be possible to tell whether
json_name
was specified in the.proto
file. In particular, there should be a way to distinguish thatjson_name
was specified forbirth_year
, and not specified forhas_hooves
.What did you see instead? It is impossible to tell whether
json_name
was specified in the.proto
file, becauseprotoc
invents ajson_name
for each field, regardless.Anything else
jsonpb
, there is anOrigName
option. But it's all-or-nothing. If you opt out of the java-ish names, you also opt out of being able to override a name..proto
file from the descriptor.