Closed SurferJeffAtGoogle closed 3 years ago
It's running version 3.0.0 of bazel, so I don't think this is related to bazel 4.0.0 being released last night.
Most probably the failure was caused by these changes in protobuf: https://github.com/protocolbuffers/protobuf/pull/7902 https://github.com/protocolbuffers/protobuf/pull/7908
The "Pythony" gist of the problem (regardless of bazel) is that protobuf 3.14.0 removed the support for legacy types of packages (in the PR listed above). At the same time, some of our other google-namespaced packages (like google-api-core) still use legacy style. As mentioned in the warning in python documentation here: https://packaging.python.org/guides/packaging-namespace-packages/#creating-a-namespace-package, using legacy and and non-legacy packages under the same namespace ("google" in this case) makes python really unhappy.
In bazel-specific sense, this coexistence of legacy packages in bazel build and non-legacy package as either system-wide dependency or as a transitive dependency pulled by bazel itself (protobuf 3.14.0) made python unable to properly resolve the packages under "google" namespace, resulting in not being able to import any of them during the gapic-generaror-python execution.
This change was released with protobuf 3.14.0, and failures started happening the same time the protobuf got the the machines which were experiencing the problem. The "getting protobuf 3.14.0" could happen in different ways on different machines, that is how it got so confusing.
The reason why it breaks autosynth builds is most probably because autosynth was installing protobuf 3.14.0 regardless of bazel build (I guess synthtool needs it for other but bazel reasons).
The reason why some docker images and/or workstation do not work and our kokoro build does - is most probably caused by the transitive dependency pulling done differently on these machines (not sure about it, could be other explanations).
How to fix: 1) I'll add a fix in gapic-generator-python to make sure that all the transitive dependencies are pulled deterministically, that will resolve and issue with bazel pulling "incompatible" protobuf version itself, but that will not solve the issue if someone had it already installed. 2) We need to either ask protobuf team revert their removal of legacy packages support. 3) Or, we need to do the same (remove legacy package support) from all of the other google-namespaced Python packages we use: google-auth, google-api-core, google-common-protos. google-auth I expect to be the most problematic one, if possible at all (it is auth after all...)
An update: The https://github.com/googleapis/gapic-generator-python/pull/753 should fix one half of the problem (Python names package consistency within bazel itself)
Once the fix above is integrated to googleapis
, it should fix all of the builds on clean systems (systems which do not have any of google-namespace Python packages installed system-wide), which is most often the case. For anyone who has protobuf-3.14.0
Python package installed system-wide the fix will still not make the googleapis
build succeed because of -nspkg.pth
files (installed with any python package which has namespace_pakcages
attribute in their setup.py
, which is the case for protobuf 3.14.0).
The detailed explanation of what is causing the problem is too long and requires too much deep Python knowledge, but in brief the problem is this: -nspkg.pth
files which are installed in site-packages
directory (only for namespace packages, like protobuf
or google-api-core
) break PYTHONPATH
package resolution, and Bazel build heavily depends on PYTHONPATH for python builds.
To fix the rest, the other google-namespaced packages must be migrated from the legacy format, similarly to how it was done by protobuf
, because protobuf
migrating to native packages put the rest of the google-namespaced packages into an inconsistent state. Also protobuf needs to get rid of namespace_packages
from their setup file (to stop generating -nspkg.pth
file on installation).
The affected packages:
google-api-core
google-auth
google-cloud-common
protobuf
1) All of the packages above must be converted either to native packages
(if they are python 3.3+
) or to pkgutil-style package
(if they are python 2.7+
). More details here
2) None of the packages should be pkg_resource-style package
.
3) None of the packages should have namespace_packages
in their setup.py
, because this is what leads to creation of nspkg.pth
files which completely kills PYTHONPATH
resolution of packages. This breaks bazel specifically because it heavily depends on PYTHONPATH
for package resolution.
Protobuf has already migrated to native packages in https://github.com/protocolbuffers/protobuf/pull/7902 and https://github.com/protocolbuffers/protobuf/pull/7908, but namespace_packages is still there (while it is an attribute of pkg_resources-style
packages, not native style ones, according to python documentation. I would also consider using pkgutil-style instead of native style for protobuf (if python versions below 3.3 are supposed to be supported).
Why we need to do it It is necessary to convert the remaining google-namespaced packages, because protobuf has already migrated its package, and if we don't do it for the rest the google namespace in Python ecosystem will remain inconsistent in terms of packages format, which leads to all sorts of problems. From the Python documentation:
Warning While native namespace packages and pkgutil-style namespace packages are largely compatible, pkg_resources-style namespace packages are not compatible with the other methods. It’s inadvisable to use different methods in different distributions that provide packages to the same namespace.
Technically speaking, if a package is already in one of the styles it should not be migrated to avoid breckages, as the documentation warns about:
If you are creating a new distribution within an existing namespace package that uses this method (pkg_resources-style, the legacy one) then it’s recommended to continue using this as the different methods are not cross-compatible and it’s not advisable to try to migrate an existing package.
But protobuf has already migrated to native style packages in 3.14.0, so the rest of the google packages (see the list above) remained in pkg_resources-style
, which put us in exactly the situation the warning above tries to warn us about. So at this point it must be either protobuf rolls back their migration (unlikely) or we migrate the rest of the google-namespaced packages to put them back in the consistent (in terms of package format) state.
@busunkim96 @noahdietz FYI (see the previous comment)
@software-dov Hey Dov! Can you please take a look at the comment above and tell me if that makes sense from the Python perspective (i.e. provide some Python expert opinion, because the complexity of this issue is way above my Python level).
I have cut pre-releases of googleapis-common-protos
, google-api-core
, and google-auth
that follow the guidance in https://github.com/googleapis/gapic-generator/issues/3334#issuecomment-768156415
1) All of the packages above must be converted either to
native packages
(if they arepython 3.3+
) or topkgutil-style package
(if they arepython 2.7+
). More details here 2) None of the packages should bepkg_resource-style package
. 3) None of the packages should havenamespace_packages
in theirsetup.py
, because this is what leads to creation ofnspkg.pth
files which completely killsPYTHONPATH
resolution of packages. This breaks bazel specifically because it heavily depends onPYTHONPATH
for package resolution.
googleapis-common-protos==1.53.0.dev2
google-api-core==1.26.0.dev0
google-auth==2.0.0.dev0
First saw around 2021-01-21 05:00.
https://fusion.corp.google.com/runanalysis/test/prod:cloud-devrel%2Fclient-libraries%2Fautosynth%2Fpython/prod:cloud-devrel%2Fclient-libraries%2Fautosynth%2Fpython/KOKORO/bd126a99-ddcf-4060-b192-d54993a500bc/1611250467223/build%20%231040/Targets?target=logs%2Fpython-bigquery&drilldownTab=logs