protocolbuffers / protobuf

Protocol Buffers - Google's data interchange format
http://protobuf.dev
Other
65.39k stars 15.46k forks source link

`py_proto_library` `_virtual_imports` causing import name collisions #17889

Open dougthor42 opened 1 month ago

dougthor42 commented 1 month ago

:exclamation: :exclamation: :exclamation: Migrated from https://github.com/bazelbuild/rules_python/issues/2133 :exclamation: :exclamation: :exclamation:

🐞 bug report

Affected Rule

py_proto_library

Is this a regression?

I haven't investigated yet, sorry :face_with_diagonal_mouth:

Description

It appears that the modifications to PYTHONPATH (added when py_proto_library uses a proto_library that has strip_import_prefix) can cause multiple paths to be included that have the same python namespaces defined.

For example: all of these paths have a doug python package in them:

bazel-bin/src/doug/foo_bin.runfiles/_main/src
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/a_proto
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto
$ tree -L 3 bazel-bin/src/doug/foo_bin.runfiles/_main/src
bazel-bin/src/doug/foo_bin.runfiles/_main/src
├── doug
│   ├── foo_bin -> /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/execroot/_main/bazel-out/k8-fastbuild/bin/src/doug/foo_bin
│   ├── foo.py -> /usr/local/google/home/dthor/dev/pyle/src/doug/foo.py
│   ├── __init__.py
│   └── proto2
│       ├── __init__.py
│       └── _virtual_imports
└── __init__.py
$ tree -L 3 bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/a_proto/
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/a_proto/
├── doug
│   ├── __init__.py
│   ├── proto2
│   │   ├── a_pb2.py -> /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/execroot/_main/bazel-out/k8-fastbuild/bin/src/doug/proto2/_virtual_imports/a_proto/doug/proto2/a_pb2.py
│   │   ├── __init__.py
│   │   └── __pycache__
│   └── __pycache__
│       └── __init__.cpython-311.pyc
└── __init__.py
$ tree -L 3 bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/
├── doug
│   ├── __init__.py
│   └── proto2
│       ├── bar_pb2.py -> /usr/local/google/home/dthor/.cache/bazel/_bazel_dthor/dbe74c4144b5c9a438d84a119652bef9/execroot/_main/bazel-out/k8-fastbuild/bin/src/doug/proto2/_virtual_imports/bar_proto/doug/proto2/bar_pb2.py
│       └── __init__.py
└── __init__.py

This causes any python code that runs import doug (or import doug.x.y.z) to look in the _virtual_imports directory because it shows up later in sys.path:

doug module:
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/doug/__init__.py
bar_pb2 module:
bazel-bin/src/doug/foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/doug/proto2/bar_pb2.py

Main question: Am I doing something wrong or is this actually (un)intended behavior?

🔬 Minimal Reproduction

Bear with me, as there's a fair bit of setup.

We use proto_library.strip_import_prefix because the some protos import others, and those proto import statements do not include src directory, as you'll see below.

Once you get the project structure in place, run:

bazel run //src/doug:foo_bin

This will print out some debugging info, namely sys.path and the path for the imported doug package and foo_pb2 module.

Actual Behavior

You'll see that the doug package will be loaded from foo_bin.runfiles/_main/src/doug/proto2/_virtual_imports/bar_proto/doug/__init__.py.

Expected Behavior

The doug package should be loaded from foo_bin.runfiles/_main/src/doug/__init__.py.

Investigation

Showing correct import path

Edit ./src/doug/BUILD.bazel to remove the dependency on //src/doug/proto2:bar_pb2 and run bazel run //src/doug:foo_bin again.

You'll see that the doug package is correctly found at foo_bin.runfiles/_main/src/doug/__init__.py (yes, the python code will then throw ImportError about foo_pb2.py, but that's expected).

Modifying proto imports fixes things

Revert those changes. Now edit ./src/doug/proto2/BUILD.bazel and remove the strip_import_prefix lines. Run foo_bin again and you'll see an error related to building the protos because the proto import path is incorrect.

With strip_import_prefixes removed, edit ./src/doug/proto2/bar.proto so that the import statement includes src/ and run foo_bin again.

-import "doug/proto2/a.proto";  // IMPORTANT
+import "/src/doug/proto2/a.proto";  // IMPORTANT

Things will work and you'll notice that the bar_pb2 is being loaded from foo_bin.runfiles/_main/src/doug/proto2/bar_pb2.py - not the _virtual_imports directory.

Sadly this "modify proto imports" fix is not an option for us - we can't update our proto imports without significant work.

Project structure and Files

.
├── src/
│   ├── BUILD.bazel
│   └── doug/
│       ├── BUILD.bazel
│       ├── foo.py
│       ├── __init__.py
│       └── proto2
│           ├── a.proto
│           ├── bar.proto
│           └── BUILD.bazel
├── BUILD.bazel
└── MODULE.bazel

./MODULE.bazel and ./BUILD.bazel have typical things you'd see for a python+proto project. We're using rules_python 0.33.1, protobuf 21.7, and rules_proto 5.3.0-21.7.

# src/BUILD.bazel
# gazelle:python_root
# src/doug/foo.py
# Just debugging things.
import sys

for p in sorted(sys.path):
    print(p)

print("*" * 50)
import doug
print("doug module:")
print(doug.__file__)

import doug.proto2.bar_pb2 as bar_pb2
print("bar_pb2 module:")
print(bar_pb2.__file__)
# src/doug/BUILD.bazel
load("@rules_python//python:defs.bzl", "py_binary", "py_library")
py_library(
    name = "foo",
    srcs = ["foo.py"],
    imports = [".."],  # because of `python_root`
    visibility = ["//visibility:public"],
    deps = [
        "//src/doug/proto2:bar_pb2",
    ],
)

py_binary(
    name = "foo_bin",
    main = "foo.py",
    srcs = ["foo.py"],
    imports = [".."],
    deps = [
        ":foo",
    ],
)

The contents of the proto files don't matter too much, except that one proto file must reference the other via an import:

# src/doug/proto2/bar.proto
syntax = "proto2";
import "google/protobuf/timestamp.proto";
import "doug/proto2/a.proto";  // IMPORTANT
package doug.proto2;

message Label {
    optional string key = 1;
    optional string value = 2;
    optional google.protobuf.Timestamp timestamp = 3;
}
# src/doug/proto2/a.proto
syntax = "proto2";
package doug.proto2;
# src/doug/proto2/BUILD.bazel
load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_python//python:proto.bzl", "py_proto_library")

proto_library(
    name = "a_proto",
    srcs = ["a.proto"],
    strip_import_prefix = "/src",
    visibility = ["//visibility:public"],
    deps = ["@com_google_protobuf//:descriptor_proto"],
)

proto_library(
    name = "bar_proto",
    srcs = ["bar.proto"],
    strip_import_prefix = "/src",
    visibility = ["//visibility:public"],
    deps = [
        "@com_google_protobuf//:timestamp_proto",
        ":a_proto",
    ],
)

py_proto_library(
    name = "bar_pb2",
    visibility = ["//visibility:public"],
    deps = [":bar_proto"],
)

🔥 Exception or Error

In "Minimal Reproduction", but the gist is ModuleNotFoundError.

🌍 Your Environment

Operating System:

gLinux

Output of bazel version:

7.2.0

Rules_python version:

0.33.1

Anything else relevant?

JasonLunn commented 2 days ago

Hi @dougthor42 -

Is this issue reproducible with the latest release? Protobuf Python version 21.7 is no longer supported. You can find the support timelines for more recent versions of Protobuf Python at https://protobuf.dev/support/version-support/#python