google / riegeli

Riegeli/records is a file format for storing a sequence of string records, typically serialized protocol buffers.
Apache License 2.0
418 stars 53 forks source link

Make Riegeli available as an external Bazel dependency #28

Closed dpetrou-continua closed 4 months ago

dpetrou-continua commented 1 year ago

I'd prefer to not manually vendor (clone) Riegeli into my source tree, but rather use it as an external Bazel dependency (either by modifying our WORKSPACE or MODULE.bazel). Might you consider providing that functionality and document it?

(Apologies if this already exists and I missed it. I didn't find any documentation like, to pick a random example, https://github.com/hedronvision/bazel-compile-commands-extractor#first-do-the-usual-workspace-setup. Nor did I find any macros for pulling in Riegeli's deps as is typical in any non-bzlmod repo that's used as an external dependency.)

David

QrczakMK commented 1 year ago

For C++ usage of Riegeli/records it should be sufficient to import Riegeli along with required libraries. From https://github.com/google/riegeli/blob/master/WORKSPACE: com_google_absl, org_brotli, net_zstd, snappy, highwayhash, com_google_protobuf, and rules_cc. Other libraries are used only by separate Riegeli libraries providing access to them or by example binaries.

I agree that this should be documented.

I don’t know how to provide macros which package that, because versions of the libraries should agree among other dependencies which also use the same libraries, and the final project should decide about the versions. A simple macro would work if a project uses only Riegeli, but it might use other libraries with shared dependencies. What is the common bazel practice for this? What if another dependency of the final project also uses Abseil, perhaps provides a macro for importing its dependencies, and the versions of e.g. Abseil or protobuf conflict?

This gets more complicated with Python or TensorFlow, to the point that I don’t know yet how this should be done. Riegeli uses TensorFlow only for separate libraries and for example binaries, but in practice TensorFlow brings the Python configuration to use, constrains versions of shared dependencies, and the setup of that is in a complicated and brittle state. Riegeli uses a precompiled TensorFlow, because it turned out that building TensorFlow from sources has even larger problems. Using precompiled TensorFlow resulted in a https://github.com/google/riegeli/blob/master/configure script to find out TensorFlow and Python flags and paths, importing the generated file to https://github.com/google/riegeli/blob/master/.bazelrc, and referring to the variables in https://github.com/google/riegeli/blob/master/tf_dependency/tf_configure.bzl. This is a mess, I am sorry for that, and I don’t know how to clean it up. Riegeli should be buildable without TensorFlow. I did not manage to make Riegeli building with TF-2.12 yet, which uses a newer protobuf version, which requires a different setup, in particular because it now depends on Abseil, and its Python setup changed as well.

dpetrou-continua commented 1 year ago

Hi Marcin, it is nice to hear from you. I think you are describing two issues: (1) what is commonly referred to as the diamond-dependency problem, and (2) complexities surrounding TF. Am I right that these are separable issues? I'll focus more on the first because I wasn't quite sure how deeply Riegeli depends on TF (I would have assumed only its protos, but I must be wrong).

I'm not a Bazel expert, but I would recommend trying to adopt the new bzlmod system. It requires a flag to enable at present, but it seems like it will move into general availability soon. In bzlmod, they apply the Go Language developer's solution of minimal version selection (https://bazel.build/versions/6.0.0/build/bzlmod#version-resolution) to address the diamond dependency issue. It's not perfect, but it works for most cases.

Further, clearly, diamond dependency issues have been addressed (hacked around) pre-bzlmod. I've seen WORKSPACE files apply patches in their http_archive() stanzas, for example. But since the future is bzlmod, maybe it's best just to work in that framework. If you choose to do that, your clients would be able to simply visit https://registry.bazel.build/, type "riegeli", and get a one line directive they can copy to get Riegeli.

More info at https://bazel.build/external/overview.

Thanks, David

SanjayVas commented 11 months ago

Perhaps this issue should be rephrased as "Publish Riegeli to Bazel Central Registry"? That would encompass migrating to Bzlmod and is the modern way to make a Bazel project consumable by others.

SanjayVas commented 9 months ago

I have a PR that publishes this to my project's own module registry for reference: https://github.com/world-federation-of-advertisers/bazel-registry/pull/9

I could theoretically contribute this to the Bazel Central Registry as well, but it would be better if a Riegeli maintainer created a release which include Bzlmod support (you'll note the module has to pick an arbitrary commit).

QrczakMK commented 4 months ago

A recent MODULE.bazel with updated dependencies is submitted now in https://github.com/bazelbuild/bazel-central-registry/pull/2186