protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
64.59k stars 15.37k forks source link

Discuss adding an option to preserve file names in C# codegen #3843

Closed kkm000 closed 1 week ago

kkm000 commented 6 years ago

This is related to https://github.com/grpc/grpc/pull/13207. When automating the build process, it is necessary to know the exact names of the files generated by protoc. In the build process I am trying to introduce for .NET Core and Visual Studio, the generated outputs files are considered transient (just like .o files would be for a C++ compilation). The problem is the filenames need to be known ahead to the build process, and they are currently camelized (and, well, deunderscored?..)

What I would like to do is add an option preserve_filename for the C# codegen plugins (both proto and gRPC) so that the generated file has the same name as the input, e. g. one_two.proto would produce one_two.cs (and one_twoGrpc.cs in the gRPC codegen). Are there any issues with that? Would such a change be acceptable?

On the same note, the function GetOutputFile in csharp_names.h would have to change. It is marked LIBPROTOC_EXPORT but is not actually used by gRPC codegen. Would it be ok to change it? Un-export and move the declaration to csharp_helpers.h? Or keep the existing API for compatibility with something else? I am thinking of passing it a const pointer to struct Options instead of the 3 last parameters, as it contains all codegen plugin options already (and will need more).

If the decision is to go ahead with this, unit tests for name generation are on me. There are currently none at all, and I would not start changing anything without them in place first.

/cc @jtattermusch, @jskeet

jskeet commented 6 years ago

I don't have a particular issue with this, but I know the protobuf team generally want to restrict the number of options. This does seem a nicely orthogonal one to everything else though.

/cc @xfxyjwf @anandolee for comment.

kkm000 commented 6 years ago

Ahem....

kkm000 commented 6 years ago

@xfxyjwf @anandolee, any opinion on this?

xfxyjwf commented 6 years ago

I'm not quite sure what problem you are trying to solve with this new option. Is the build tool unable to convert .proto file name to camel-case names? Or maybe you are just trying to avoid coding a camel-case conversion in the build tool? If it's the later, I would suggest just implement the conversion code in the build tool. The mapping from .proto file name to generated file name is public API. Adding complexity to protobuf public API to save a few lines of code in a specific build tool is hardly a good tradeoff.

If the build tool is unable to do the mapping for some reason, there are some other options we can consider:

  1. Make protoc generate files into a zip package. For example, protoc can generate Java files into a source .jar file, and we can do something similar for C#.
  2. Make protoc generate a mapping file to be used by the build tool. protoc already has the ability to generate a dependency file between proto imports for make to use. We can also generate something specific for this build tool to use.
kkm000 commented 6 years ago

I think it would be possible to recreate the mapping, certainly, but that to me looks rather a workaround, which would require maintenance to always match the algorithm. This is very error-prone as well, as an error here will likely result in incorrect dependencies. Or, in a sense, going through self-imposed hoops.

My general target is to make proto files the first class citizen in the .NET land. We got quite a discussion going in https://github.com/grpc/grpc/pull/13207, and may I please invite you to join into it? There are certain problems with proto toolset packaging in NuGet. Everything I said there about the Grpc.Tools package applies to Google.Protobuf.Tools as well; in short, the tools as packaged are impossible to discover in the build process, even if the package is added to the project. So, generally speaking, the current packaging is no more useful that a simple .tar.gz or .zip archive with tools and import protos in known subdirectories, if just downloaded and unpacked. And an automatic installation of the NuGet package makes it worse than that.

Also, if you do not mind my asking, what exactly do you see wrong with adding the option? The suggestions that you are providing seem to involve much bigger changes to the codegen that just the single option. I am probably missing something deeper, but why do you deem it preferable to the single option? (And, really stumbling to me, how generating files into a .zip package would not require another option just for that purpose?)

kkm000 commented 6 years ago

The mapping from .proto file name to generated file name is public API. Adding complexity to protobuf public API to save a few lines of code in a specific build tool

I think I did not address this one. The public API is C++, if I understand it correctly, and the build tool in question is MsBuild, which is pure .NET. So just to access the API from the build will require supplying 6 precompiled native libraries for every platform and architecture for which Protobuf and gRPC are packaged. This would probably add more problems than it solves.

Another stumbling block in this direction that it will need a kind of cross-package discovery of protoc libraries, so that, even if I am missing it and the .NET protoc (not the compiler) API is exposing the function, using it is still not as simple, as it should be uniformly available and discoverable at the build time. And we have "old" csprojects on Windows, "new" on Windows, Linux and Mac, and mono supporting both styles on Mac and Windows.

It's not that it's not doable; it's just the amount of work does not seem right to end up with just a predictable file name.

I'll see what I can get from the option 2, thanks for this pointer.

kkm000 commented 6 years ago

@xfxyjwf: The --dependency_out looks really promising, thanks for the suggestion. It would certainly require more work on the MSBuild scripting side, but it provides finer-grained dependency information, which is certainly useful.

It will require some additional code in the Tools package, an MSBuild task DLL.

/cc @jskeet -- what is your opinion? I did not think it out through in detail; certainly there is an interesting question of handling handle multiple dependency files in a multi-proto compilation.

xfxyjwf commented 6 years ago

I think I did not address this one. The public API is C++, if I understand it correctly, and the build tool in question is MsBuild, which is pure .NET. So just to access the API from the build will require supplying 6 precompiled native libraries for every platform and architecture for which Protobuf and gRPC are packaged. This would probably add more problems than it solves.

I'm suggesting the mapping algorithm from .proto file name to generated .cs file name is public API, not one particular implementation. We just need to document precisely what s done in the transformation so you can implement the same logic in any other language.

kkm000 commented 6 years ago

@xfxyjwf: Ah, I see what you mean. Either way, doing that requires writing an MSBuild task (a dynamic .NET library used as a build extension, in case you are not familiar), and if we should go this way, then using the --dependency_out is certainly a better source of truth. Except it's not available on the initial build, which makes it an interesting problem to solve.

(And writing the task opens a whole new can of compatibility worms; I am still trying to investigate what kind of thing we are going to step into with this. This is not as exciting to solve, tho :))

There is another interesting thing I noticed:

$ protoc --cpp_out=out ptree/proto1.proto ptree/sub1/proto1.proto
$ protoc --csharp_out=out ptree/proto1.proto ptree/sub1/proto1.proto
Proto1.cs: Tried to write the same file twice.

cpp_out lays output files into a nice directory structure according to their source, but charp_out lumps them together into the same flat directory. Is this by design? No problem with the build either way, just have to ask if this is not a bug.

jskeet commented 6 years ago

As I said before, I'd personally be okay with this as an option (probably a C#-specific option) but it's really not my call - I'm not in the protobuf team, I just wrote most of the code earlier.

kkm000 commented 6 years ago

@xfxyjwf: Can you please help with this error?

Can only process one input file when using --dependency_out=FILE.

Am I using it wrong? Is this a principal limitation, hard to change? I was so gung-ho about this option, but it shoot back at me! The currently generated dependency file is a conceptually one long line, wrapped with backslashes. Either way would work, ideally 1 line per output, each listing transitive dependencies

out/file2.pb.cc: file2.proto  /include/timespamp.proto
out/file1.pb.cc: file1.proto  /include/wrappers.proto /include/timespamp.proto

but even 1 line per protoc invocation would also work (although it may introduce false positives), e. g.

out/file1.pb.cc out/file2.pb.cc: file1.proto file2.proto /include/wrappers.proto /include/timespamp.proto

Invoking protoc on each file separately is far from ideal, and since the dependencies may change, there is not even a shortcut even when the cached dependency file exists.

xfxyjwf commented 6 years ago

@kkm000 I'm not sure why we have that restriction. It was added in https://github.com/google/protobuf/commit/e2555e235f867f3d7a0378e95a45109c8fd2dfbe by @TeBoring . Bo, do you remember why we restrict the number of input files to be 1 for the --dependency_out flag to work?

Extending it to support multiple files seems good to me.

kkm000 commented 6 years ago

@xfxyjwf, @TeBoring: FWIW, I just disabled the check in command_line_interface.cc, and got an apparently good dependency output from two source proto files with a gRPC service in each.

out/Proto1.cs \
out/Proto1Grpc.cs \
out/ProtoSub1.cs \
out/ProtoSub1Grpc.cs: ptree/sub1/proto_sub1.proto\
 ptree/proto1.proto
TeBoring commented 6 years ago

Explained in #3959.

RalfKubis commented 1 year ago

Why are identifiers transformed by the C# generator at all (file, message oder field names)? If the proto interface uses snake case, it should be transparently named in the generated code IMO. If there is no reason, an option to opt-out identifier transformation would be great to get rid of this extra complexity.

If anyone could bump me towards an according document or discussion I would be glad.

jskeet commented 1 year ago

Why are identifiers transformed by the C# generator at all (file, message oder field names)?

Because failing to do so would:

The first of these reasons is the primary one. C# code using protos should feel natural - I would really hate having a completely inconsistent codebase with properties not following conventions, having to use @ because someone chose to call a field private etc.

It would certainly be technically feasible to add a command-line option to opt out of this (rather than an annotation, given that multiple users may want to generate the same proto in different ways) but I very much doubt that "we" (in Google) would want to do the work for that, and I'm not even convinced we'd want to accept it as a pull request. I can raise it with the rest of the Protobuf team if you like though.

RalfKubis commented 1 year ago

@jskeet Thanks for taking this serious and this incredibly quick response.

Being with cpp-proto quite a while now and joining csharp recently I'm aware of the fact that using PascalCase is quite popular in C#. Probaly because the initiators and library designers chose so.

Nevertheless, there are arguments for and trends towards alternative styles - which can very well co-exist in the same code base - and my argument would be, that it should be up to the particular dev-team which style to follow and to do so in the proto. And that the tool should, in order to allow future trends, not cement a particular style. Even the compilers, fortunately, allow this.

Having to maintain implementations of the same interface in different languages lets this identifier transformation feel a bit like an obfuscation.

If the interface calls a thing like this, one would expect, that THE or all implementations call it the same way - wouldn't one?

After skimming the code I would dare to give it a try, given that there is a chance of acceptance.

So yes, if it isn't asked too much, I would like to know what the team thinks about this.

in any case, a big Thank You for you and the team for all this effort gRPC is great

jskeet commented 1 year ago

I won't try to argue the case for why I believe this is a bad idea - because I don't have lots of time to commit to this, and my experience of other similar discussions is that no-one actually persuades anyone of anything. However, I will ask the protobuf team what they think of the proposal for a command-line argument. Of course, you'd also have to persuade the gRPC generator maintainers to add a similar command-line argument to the gRPC generator...

jskeet commented 1 year ago

@RalfKubis: I've heard back from the protobuf team, and we wouldn't want to accept the change into this repo, I'm afraid. Of course that doesn't stop you from creating your own fork for it and maintaining that; it shouldn't cause any wire format issues. But it's not something we want to pursue. Thanks for the suggestion though.

github-actions[bot] commented 4 weeks ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.

This issue is labeled inactive because the last activity was over 90 days ago.

github-actions[bot] commented 1 week ago

We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.

This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.