vmagamedov / grpclib

Pure-Python gRPC implementation for asyncio
http://grpclib.readthedocs.io
BSD 3-Clause "New" or "Revised" License
936 stars 92 forks source link

Support adding a pb2_module_prefix on protoc plugin #196

Open chrisirhc opened 3 months ago

chrisirhc commented 3 months ago

This allows for usage with rules_proto_grpc with prefix_path. prefix_path on the rules_proto_grpc's python_grpclib_library prefixes the generated files. However, the generated _grpc files can't find it because it's looking in the original proto path's module path instead of the prefix path.

Example usage from python_grpclib_library:

python_grpclib_library(
    name = "python_grpclib_default_library",
    protos = [":proto_default_library"],
    visibility = ["//visibility:public"],
    prefix_path = "my/grpclibgen",
    options = {
        "@@//rules/python:grpclib_plugin": ["prefix_pb2_module=my.grpclibgen."]
    },
)

The options get passed to the plugin as request.parameter, comma-separated.

Usage from protoc plugin:

protoc --grpclib_python_out=prefix_pb2_module=my.grpclibgen.:.

I didn't find any tests for plugin.py , and don't have a good sense of how I'd go about adding a test or documentation about this.

Right now, this PR is a bit weird in that it uses the = character twice in the arguments. Happy to change it if anyone else has other suggestions for a better character to use or another parsing method.

vmagamedov commented 3 months ago

Prefix path is not something that Google's protoc or grpcio supports, it's a feature of rules_proto_grpc, Bazel-specific. Or I can't find the reference.

Such customisations can be done in proto_library rule using import_prefix and strip_import_prefix attributes. Then you will be able to use any protoc plugin to generate code without any extra options.

chrisirhc commented 3 months ago

Update: Re-read your response and rewrote my response Let me check and get back to you. I thought I already tried import_prefix but perhaps I missed it.

chrisirhc commented 3 months ago

import_prefix changes the import path at the .proto. In this PR, we're changing the import path on the generated python_library.

The distinction is important because, as noted in the import_prefix rule:

When set, the .proto source files in the srcs attribute of this rule are accessible at is the value of this attribute prepended to their repository-relative path.

That means if we use import_prefix, we actually need to change all the import statements in the .proto to be prepended with the value that is specified in the import_prefix.

For instance, if a.proto imports b.proto in the same directory: No import prefix import_prefix="my/gen" pb2_module_prefix=my.gen (this PR)
a.proto import "b.proto"; import "my/gen/b.proto"; import "b.proto";
a_pb2.py import b import my.gen.b import my.gen.b

This shows that import_prefix isn’t suitable for fixing namespacing issues in Python libraries without modifying import paths within .proto files. Our goal is to avoid rewriting imports in the proto files while preventing module collisions.

vmagamedov commented 3 months ago

I wonder how python_proto_library is able to create *_pb2.py modules with import prefixes. When you import one proto from another proto you also have imports in generated Python modules. So it is not as simple as just copy them into some directory (Python package), imports inside generated Python files also needs to be fixed.

I found some hacks in rules_proto_grpc project but I was unable to find how they implemented these import prefixes for messages and grpcio stubs.

In your example if you don't want to change imports in proto files then a.proto should have import "b.proto"; instead of import "my/gen/b.proto";. That's why I don't understand why a_pb2.py will have import my.gen.b instead of import b.

If rules_proto_grpc can somehow rewrite proto files or proto file descriptors then grpclib will also generate imports you want.

chrisirhc commented 3 months ago

I found some hacks in rules_proto_grpc project but I was unable to find how they implemented these import prefixes for messages and grpcio stubs. I did some investigation, and it's done through adding directories and symlinking. The generated import path is based on the proto-path and the input file's location.

I've put up a reproduction in https://github.com/chrisirhc/protobuf-import-prefix-repro . It seems that the use case I'm looking for is not supported by protoc itself. The protoc compiler will always treat imports in the .proto file as an absolute path. So if it encounters 'b.proto', it will expect it to be in the root of the proto, and it cannot look for it in my/gen/b.proto. If it does encounter it in my/gen/b.proto, it'll actually report the import 'b.proto' as a duplicate import, and will fail. (See https://github.com/chrisirhc/protobuf-import-prefix-repro for detail on the failures)

In your example if you don't want to change imports in proto files then a.proto should have import "b.proto"; instead of import "my/gen/b.proto";. That's why I don't understand why a_pb2.py will have import my.gen.b instead of import b.

The goal is to add an import prefix so that there's no namespace collisions in the absolute import. Hence, the behavior. More details in the https://github.com/chrisirhc/protobuf-import-prefix-repro README.

However, it does seem like the behavior I'm looking for is not supported on protoc itself and hence it requires every plugin to add this feature if it needs to be supported, like here.

This is unfortunate, since that means that every proto file that has imports needs to be aware of what directory structure (or what import prefix) it will be finally used in. For example, vendoring groups of proto files requires all imports to be rewritten in them.