tensorflow / model-analysis

Model analysis tools for TensorFlow
Apache License 2.0
1.26k stars 281 forks source link

Protocol buffers are not correctly built during installation #188

Open peytondmurray opened 1 month ago

peytondmurray commented 1 month ago

System information

Describe the problem

This issue is a catch-all for a number of problems with the current build system. Starting from the beginning:

  1. Hardcoded protobuf v3.21.9 dependency: Although the WORKSPACE file defines targets from com_google_protobuf for v3.21.9, it doesn't actually use _PROTOBUF_COMMIT except in stripping output. It should use _PROTOBUF_COMMIT in both the archive name and for stripping output.
  2. Version inconsistencies: setup.py requires protobuf>=3.20.3 for python<3.11, which doesn't match the version grabbed by bazel. For python>=3.11, protobuf>=4.25.2 is required, which is a full major version different and even more likely to be incompatible.
  3. Bazel doesn't build the protocol buffers: setup.py does an ad-hoc platform-dependent search for protoc, meaning that the version of protobuf downloaded by bazel never gets used. If the build environment already contains any version of protobuf, setup.py will happily use it, leading to generated files which are incompatible with the rest of the code.
  4. com_google_protobuf gets clobbered by rules_rust transitive dependency: Bazel never downloads the version of protobuf that you request in the WORKSPACE file because rules_rust has a transitive dependency on com_google_protobuf that takes precedence. Bazel silently builds the protocol buffers using a much older version of protobuf as a result, again leading to library incompatibilities at runtime.
  5. bazel is never invoked from setup.py: Although there is a BUILD file for generating python code from the protobufs, bazel is never called from setup.py.
  6. tensorflow_model_analysis/proto/BUILD points to the wrong protobuf dependency: Needs to point to the explicit protobuf dependency in WORKSPACE.

Even if bazel is made to build the protocol buffers, setup.py will need to be modified to grab the sources from bazel-bin/ when the wheel is being built. IMO this could be much more easily done with meson-python, which has a first-class build backend for Python already, would allow for robust version control for external tooling with fallback options as well if the host doesn't have the right version of protoc; we'd also avoid problems with transitive dependencies clobbering our actual dependencies too. If this is something folks are interested in, I'm happy to write the meson.build. Otherwise we can stick with bazel and call it by hand in setup.py.

On my system I'm unable to run tests because of this, but because bazel provides partial build isolation, whether you are affected by this or not really depends on the build environment.

cc @smokestacklightnin