temporalio / api

Temporal gRPC API and proto files
https://temporal.io
MIT License
82 stars 67 forks source link

[Proposal] Allow languages to customize package/namespace structure of generated proto APIs #169

Open macrogreg opened 2 years ago

macrogreg commented 2 years ago

Hi folks,

I have a proposal (incl. a sample solution) that I would like to discuss. If folks find the following interesting in general, I have a draft PR, for which I'd love to collect feedback.

Here, I am making a critical assumption on which the need to this proposal is based. Let us assume that the assumption is, at least partially, correct. If there are strong concerns with it, we can get back to it later:

Most developers using some particular language come from not-yet-being Temporal-users to being Temporal-users in their preferred language of choice. Comparatively very few actually use Temporal in different languages. Therefore, while consistency is important, we must be prepared to make improvements in some languages that cannot be made in other languages (for back-compat or other reasons), even if that affects consistency across languages. That is not to dismiss the importance of consistency. Yet, still, language-specific improvements must be possible not only in areas of language-specific idiomatic use.

Here, I am proposing an improvement of that kind. 😃

Context

Our proto files specify, among other things, what package names (aka namespaces) the generated language APIs should have in various target languages. So far we have left them unchanged, simply following the directory structure in the API repo. For established languages this is now a done deal, as changing the package names is a compatibility break. For new / not-established languages, we may have some more flexibility to improve things (if needed).

Problem

While designing and developing the .NET SDK I have met the need to constantly import a multitude of Temporal.Api.Xyz namespaces. It feels quite redundant and annoying. Doing it once is not a big deal. But in practice, you have to do it for EVERY SINGLE FILE that uses Temporal's data contracts in your application. And in a non-trivial application, that can be quite many files. Java users may not fully appreciate this, as they can simply import Temporal.Api.*, but such a wildcard syntax does not exist in all languages.

Of course, the IDE can help doing this quickly. But just like with the *-syntax, the convenience of it varies across tools and environments. Some developers simply prefer using notepad-like editors. In other cases, the advanced IDEs' convenience factors for this particular scenario vary too. For example, Visual Studio does not automatically import namespaces for good reasons that go beyond the scope of this discussion. Instead, you have to click on a type that is not yet "found" (or hit a hot-key while the cursor is there) and then select an item in the appearing drop-down menu, and VS will then import the namespace. You have to do it every time a type is not found. In our case - once per required Temporal namespace in each file that needs that namespace. So basically - a lot.

Solution

One of our guiding principles is focus on our users/developer. And we can make their lives better by improving on the package/namespace structure of temporal APIs. Crucially, we can only do it for new or almost new Temporal languages.

For example, for .NET, I would consider it a great story to reduce all the namespaces to the following four:

Temporal.ServiceApi.V1.WorkflowService
Temporal.ServiceApi.V1.OperatorService
Temporal.ServiceApi.V1.DataModel
Temporal.ServiceApi.V1.DataModel.ErrorDetails

Explanation:

Root namespace: Temporal.ServiceApi.V1....

Temporal.ServiceApi.V1.WorkflowService

The RPC declarations (and only the RPC declarations) for the workflow service. No data exchange types. Essentially, what is currently declared in workflowservice/v1/service.proto.

Temporal.ServiceApi.V1.OperatorService

The RPC declarations (and only the RPC declarations) for the operator service. No data exchange types. Essentially, what is currently declared in operatorservice/v1/service.proto.

Temporal.ServiceApi.V1.DataModel

All data exchange types. There is no reason for distinct namespaces. For the user, this is the data model for interacting with the Temporal service. Currently, this includes everything that is not in the other namespaces.

Temporal.ServiceApi.V1.DataModel.ErrorDetails

The contracts here are special because they are not really part of the API. In fact, to my knowledge, currently only Go, Java, Python, and C++ actually support the gRPC richer error model on which these are based.

Engineering considerations

I prototyped making the proposed improvement for .NET while leaving other languages unchanged. Initially, it required to changing the namespace setting in each file accordingly. E.g.:

option csharp_namespace = "Temporal.Api.Workflow.V1";

to

option csharp_namespace = "Temporal.ServiceApi.V1.DataModel";

Unfortunately, that is not enough. The thing is that some proto compilers use the file name as a container name when grouping generated code. If the same file name exists in different namespaces, it is not a problem. However, the same file name cannot exist in the same namespace/package more then once, even if it is placed in different directories. Yet, call almost everything message.proto. I addressed that with a structural renaming. E.g.:

temporal/api/common/v1/message.proto --> temporal/api/common/v1/common_message.proto temporal/api/failure/v1/message.proto --> temporal/api/failure/v1/failure_message.proto temporal/api/taskqueue/v1/message.proto --> temporal/api/taskqueue/v1/taskqueue_message.proto . . .

This did the trick and the protos compiled just fine. Only the .NET generated files should be different from before. Here is a draft PR: https://github.com/temporalio/api/pull/170.

Alternative option:

Note that once we are already forced to do a file renaming like above, we are missing an opportunity to remove some redundant characters in the file names. For example:

temporal/api/common/v1/message.proto --> temporal/api/v1/common_message.proto temporal/api/failure/v1/message.proto --> temporal/api/v1/failure_message.proto temporal/api/taskqueue/v1/message.proto --> temporal/api/v1/taskqueue_message.proto . . .

As a result of that, all the files end up in the same directory. Whether or not it is a good thing is a matter of taste, and various constraints, some of which are discussed below. Either way, here is a draft PR: https://github.com/temporalio/api/pull/171.

Summary:

I validated that both of the above options compile fine to .NET, using the proposed namespace structure. Other languages should be unchanged. However, the file renaming offers other (new) languages the option to do their own namespace mappings if they wish to.

If people are interested in this proposal, I'd be more than happy to validate all other relevant languages to remain unaffected by this restructuring of source file locations.

Please, let me know what you think.

cretz commented 2 years ago

You don't need to do anything to this repository to get the protos the way you want for your SDK.

I have had many cases where I move generated classes around, rename them, etc harmlessly. This does not affect the source protos nor should it. Make your generated protos look however you want to look in your SDK. In fact, you probably should as the default protoc for most languages is ugly. If that means as part of your build script you take the protos and move them around the way your draft PR does, no problem. If that means advanced post processing, no problem. That doesn't need to affect this repository.

We do some advanced proto generation post-processing in the Python SDK for example because the generated result is not to our liking.

Having said that, I do support changing the csharp_namespace if we really must. But the current values seem reasonable to me for consistency (in our other SDK's, we call it "api" not "service api" and we have the version as the last member of the package...would need a good reason to make C# special).

macrogreg commented 2 years ago

You don't need to do anything to this repository to get the protos the way you want for your SDK.

Absolutely. However, option csharp_namespace is used for nothing other than lang data-model generation. Why would we give it one value here, if we know that we prefer to use another value in the SDK? I would much rather see a consistency within a language, between what we display here, as a part of our API definition and what you actually encounter in a lang SDK. That would also make the likelihood of something getting out of sync much smaller and the build scripts much simpler.

Conversely, the between languages consistency, is -while nice - less important. Languages should follow their preferred design principles when creating namespaces. As an example, if one language has the wildcard (*) syntax and thus, traditionally more fine-grained package names, they should be able to follow that tradition without forcing it on other languages.

macrogreg commented 2 years ago

Also, even if for some reason we decided to have a different value for option csharp_namespace here and in the SDK, the proposed file-name refactoring would make it heaps easier for .NET and every language who wants to consider something similar. :)

cretz commented 2 years ago

Languages should follow their preferred design principles when creating namespaces.

I am not sure the difference between "ServiceApi" and "Api" is a language-specific concern.

the proposed file-name refactoring would make it heaps easier for .NET and every language who wants to consider something similar.

This is not true though. It is actually easier for most languages' protoc generators if protos for separate proto packages are in separate directories. We had to refactor core protos at https://github.com/temporalio/sdk-core/pull/255 to basically do the opposite of what you're doing here so that the same dir didn't have two of the same unqualified message name in it. The two reasons I gave in that issue:

Unfortunately protoc in each language leaves a lot to be desired and most languages have to adapt the result to their needs. I understand this makes it tough for .Net, but I don't think anything short of maybe csharp_namespace should be changed here.

macrogreg commented 2 years ago

Often code generators may choose to use the proto dir/import path instead of the proto package name to choose where to place files

Interesting point. I guess, the protoc compilers are quite different across languages. Perhaps the first proposed PR (https://github.com/temporalio/api/pull/170) highlights a better direction (rather than the second like I thought initially). There, the proto files are kept in the same directories as today (unchanged), but the file-names are made unique. E.g.: temporal/api/taskqueue/v1/message.proto --> temporal/api/taskqueue/v1/taskqueue_message.proto Perhaps, despite the redundancy, it is the best of both worlds, because it allows language compilers with various assumptions to do the right thing?

macrogreg commented 2 years ago

It is confusing for an importer to not know based on import what proto package is being used

Could you please clarify on this? Under what circumstances in this an issue?

cretz commented 2 years ago

Could you please clarify on this? Under what circumstances in this an issue?

Sorry I just saw this. For proto writers, import "mydir/foo.proto" having messages in package some.package and import "mydir/bar.proto" having message in package some.other.package is confusing. Usually it helps proto writers if the import path/dir relates to the proto package in some way.

In general though, I think at least for now, having the .Net namespaces mirror the proto packages is reasonable (same as all of our other SDKs, though slight changes may be needed if protoc generates an ugly name for something).