tweag / rules_nixpkgs

Rules for importing Nixpkgs packages into Bazel.
Apache License 2.0
294 stars 81 forks source link

Support remote execution with rules_nixpkgs #180

Open aherrmann opened 2 years ago

aherrmann commented 2 years ago

Is your feature request related to a problem? Please describe. Bazel supports remote execution through the remote execution protocol. This protocol manages inputs and outputs required by and generated by Bazel build actions, i.e. actions defined by regular Bazel rules.

However, rules_nixpkgs defines repository rules that invoke nix-build during Bazel's loading phase. The Nix package manager will then realize Nix store paths (typically under /nix/store/...) and generate symlinks into Bazel's execution root. These Nix store paths are outside of Bazel's control and the remote execution protocol does not ensure that these store paths are also realized on the remote execution nodes.

Remote execution actions that depend on Nix store paths will fail if the required Nix store paths are not realized on the remote execution nodes.

Describe the solution you'd like We need some solution to ensure that Nix store paths that are required for Bazel build actions exist on the remote execution nodes that these actions may be run on.

Some possible approaches:

cc @YorikSar @AleksanderGondek @r2r-dev

Jonpez2 commented 2 years ago

If we could find a way to run every action, whether local or remote, under something equivalent to nix-shell --pure -p [the-nix-packages-you-told-me-you-depend-on], then that would be best, right? Then it wouldn't matter whether you were local or remote, and you'd never get inhermeticity. I'm certain there's a flake approach here too - maybe before every action we emit a flake.nix with appropriate dependencies and nix run that?

Jonpez2 commented 2 years ago

Does this seem credible?

aherrmann commented 2 years ago

@Jonpez2 Perhaps yes, though this may still require additional metadata: To achieve fine granularity the set [the-nix-packages-you-told-me-you-depend-on] would need to be minimal for each action. So, it's not one global Nix environment for the entire build, but precise Nix dependencies for each build action. The way we integrate nix with Bazel through rules_nixpkgs the Nix store paths are not provided through a nix shell environment, but instead through symlinks into the nix store (with gc roots), i.e. rules_nixpkgs invokes nix-build.

Jonpez2 commented 2 years ago

Yes for sure, precise for each action. And then no symlinks and gc roots, but just calls within nix contexts. That would be cleaner, right? So maybe we would have a nix toolchain base or something which provides the machinery to wrap all actions in a nix-shell/flake invocation, and then we find a way to inject that into all actions of all rules that use the toolchain? Or something... There's a core good idea here that would make bazel + nix super happy together, I can feel it...

AleksanderGondek commented 2 years ago

@aherrman @Jonpez2 Apologies for I have been living on a quite rules_nixpkgs-distant planet for a while :D

In principle, every Bazel action should be hermetic and pure - therefore it stands to reason that running it within a very narrowly defined nix-shell achieves that goal and I would love to be able to do just that.

However, due to my experiences of running rules_nixpkgs and Bazel in tandem (which makes me also biased and blind to “newcomer” perspective) I see one major, troublesome aspect of proposed approach:

Interoperability with existing Bazel ecosystem / rules.

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs - example cc_library would need to recognize inputs provided by nix package manager and act on them differently then the ones from Bazel itself). Great deal of composability is sacrificed.

Jonpez2 commented 2 years ago

Yeah agreed. I think we need to lobby someone in bazel land to figure out how to do this natively. It probably will require internal changes…

On Tue, 18 Oct 2022 at 19:28, Aleksander Gondek @.***> wrote:

@aherrman https://github.com/aherrman @Jonpez2 https://github.com/Jonpez2 Apologies for I have been living on a quite rules_nixpkgs-distant planet for a while :D

In principle, every Bazel action should be hermetic and pure - therefore it stands to reason that running it within a very narrowly defined nix-shell achieves that goal and I would love to be able to do just that.

However, due to my experiences of running rules_nixpkgs and Bazel in tandem (which makes me also biased and blind to “newcomer” perspective) I see one major, troublesome aspect of proposed approach:

Interoperability with existing Bazel ecosystem / rules.

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs - example cc_library would need to recognize inputs provided by nix package manager and act on them differently then the ones from Bazel itself). Great deal of composability is sacrificed.

— Reply to this email directly, view it on GitHub https://github.com/tweag/rules_nixpkgs/issues/180#issuecomment-1282836243, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN425OZ6HHGMLN7CFJZQY3WD3T5HANCNFSM5M3AOKUA . You are receiving this because you were mentioned.Message ID: @.***>

aherrmann commented 2 years ago

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs

Correct, the proposed "If we could find a way to run every action, whether local or remote, under something equivalent to nix-shell [...]" would, I think, require a change to Bazel itself. An alternative that we've been thinking about, if we're to modify parts of Bazel anyway, is to tackle this at the remote execution protocol level. That protocol currently has no notion of a "system root" or other kinds of per-action system dependencies. If the protocol could express such constraints per-action in a generic way, then the remote side could ensure that the constraints are resolved before the action is executed. E.g. that nix-build is run to provide the needed Nix store paths.

Side note, this is, not directly but still somewhat tangentially, related to https://github.com/bazelbuild/bazel/issues/6994.

AleksanderGondek commented 2 years ago

I have not fully thought this through, but it seems to be that unless the change would be placed in Bazel code itself (sic!), all of existing rules would need to change to be able to act on inputs delivered from nixpkgs

Correct, the proposed "If we could find a way to run every action, whether local or remote, under something equivalent to nix-shell [...]" would, I think, require a change to Bazel itself. An alternative that we've been thinking about, if we're to modify parts of Bazel anyway, is to tackle this at the remote execution protocol level. That protocol currently has no notion of a "system root" or other kinds of per-action system dependencies. If the protocol could express such constraints per-action in a generic way, then the remote side could ensure that the constraints are resolved before the action is executed. E.g. that nix-build is run to provide the needed Nix store paths.

Side note, this is, not directly but still somewhat tangentially, related to bazelbuild/bazel#6994.

There is another way to move forward, which I feel is a bit more lean and less disruptive towards overall Bazel build model.

Bazel has Remote Assets API that can be extended to provision /nix/store-bound artifacts. Qualifiers could be employed to pass on any additional required metadata and the big issue of ensuring RBE execution platforms hosts /nix/store consistency is solved.

Jonpez2 commented 2 years ago

FWIW, if we're thinking of something remote-execution-specific, BuildBarn has the following two interesting (and I think related) issues: https://github.com/buildbarn/bb-remote-execution/issues/40 https://github.com/buildbarn/bb-remote-execution/issues/23

Jonpez2 commented 2 years ago

@AleksanderGondek - re the Remote Assets API - would that mean something like doing a nix query to find the transitive closure of required /nix/store roots, and then having the remote worker unpack into a /nix/store that looks exactly like that? And then having starlark code which figures out resolved paths within /nix/store, and executing with that? That seems a bit entwined and fragile to me...

aherrmann commented 2 years ago

There is another way to move forward, which I feel is a bit more lean and less disruptive towards overall Bazel build model.

Bazel has Remote Assets API that can be extended to provision /nix/store-bound artifacts.

What I remember of this approach from the last attempt was that it only worked when the remote execution system could still access the needed nix files on the host directly. That's usually not true, e.g. when a developer issues a build on their development machine and the remote execution service runs on a cloud platform or somewhere else on a remote machine. If that limitation could be fixed then this could indeed be a viable option.

Jonpez2 commented 2 years ago

I really think the only safe option is a nix-shell or nix run flake style wrapper. Do all actions happen within the context of some particular posix toolchain by any chance? e.g. do they all invoke via a bash selected from the toolchain, or something like that? Is there any way we could hook that? It would mean something like generating one toolchain per rule kind or something, but it would be at the bazel level and therefore equivalent between local and remote...

[Edit] I'll leave this comment here, but it's obviously off base, and couldn't possibly be true.

uri-canva commented 2 years ago

Note there's some extra considerations if the host machine is a different platform than the remote executor. The common way of using rules_nixpkgs right now is to let nix detect the host machine platform, instead of having bazel pass that platform information in (which it can't even do for repository rules anyway, it can only do it for actions).

uri-canva commented 2 years ago

Do all actions happen within the context of some particular posix toolchain by any chance? e.g. do they all invoke via a bash selected from the toolchain, or something like that?

Not from a toolchain, but the default shell used by actions can be set with the BAZEL_SH environment variable or the --shell_executable option. Note not all actions use the shell anyway, some execute the binaries directly.

It would mean something like generating one toolchain per rule kind or something, but it would be at the bazel level and therefore equivalent between local and remote...

No that's definitely the approach that would be the most compatible with bazel: defining toolchains using the bazel APIs in a way such that bazel doesn't need to know anything about nix. For example if we define toolchains with binaries that are nix-shell wrappers, as long as the executors have nix installed, then running those wrappers will work as expected, and assuming they include absolute paths to the store paths their contents will stay the same if the underlying derivation is the same, or change if the derivation changes, which lets bazel handle the caching correctly even without any knowledge of the derivation.

uri-canva commented 2 years ago

Just as some extra context, we've been looking at this problem too. One approach that we've spiked is using nix to build portable binaries that are self contained, using buildFHSUserEnv. In practice it works, but it's a bit clunky, and it doesn't have the same advantages as you'd get from using nix built binaries, since you need to reimplement a lot of the builder scripts / derivations, and the resulting binaries still need to be compatible with the glibc version running in the executor, assuming you use glibc and not musl, which is what you need to be able to use prebuilt dependencies from the respective ecosystems in your interpreters and compilers.

Jonpez2 commented 2 years ago

Another point that occurred to me this morning - I think we need to define a Provider which adds nix-specific data for a rule: i.e. which nix packages are require to build the rule, and which are required to run the rule. Then maybe we figure out how to plug in a host-package-manager manager into bazel itself, which consumes data out of such a provider (or the transitive closure of the providers collected from the deps?) , and pre-sets-up the host env for the rule's action executions. I say this because some rules may run on remote host a), resolve a /nix/store path, and then hand it over to another rule which proceeds to run on remote host b) which hasn't got that resolved. So we need to communicate requirements between rules I think.

Jonpez2 commented 2 years ago

Does all of this apply to guix as well? Ie could we make it a bit more palatable to bazel by making it apply to at least 2 mainstream package managers? Do you in tweag have interest in a call to discuss possible ways forward on this? I would be excited to give some context on my usecase.

uri-canva commented 2 years ago

Note that changes to the remote execution APIs are a bit more complex to get through, since there's several implementations of it. See https://github.com/bazelbuild/remote-apis#api-community.

Jonpez2 commented 2 years ago

Yeah I really don’t think this should happen via the remote execution api. On nix, this can and therefore should work exactly the same across remote and local, no?

aherrmann commented 2 years ago

I'm currently a bit overloaded on other work and OOO due to an accident. I haven't managed to catch-up on the discussion above, yet. I haven't forgotten it and I intend to contribute further, just letting you know to manage expectations.

Jonpez2 commented 2 years ago

I hope everyone involved is ok!

On Mon, 31 Oct 2022 at 13:18, Andreas Herrmann @.***> wrote:

I'm currently a bit overloaded on other work and OOO due to an accident. I haven't managed to catch-up on the discussion above, yet. I haven't forgotten it and I intend to contribute further, just letting you know to manage expectations.

— Reply to this email directly, view it on GitHub https://github.com/tweag/rules_nixpkgs/issues/180#issuecomment-1297076621, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN425JQSNPT24ZBRAIDK3LWF7BKZANCNFSM5M3AOKUA . You are receiving this because you were mentioned.Message ID: @.***>

Jonpez2 commented 2 years ago

Hello again! No-op sort of an update: I've been having a dig around in the bazel codebase to try to figure out some kind of a way forward here, but I haven't come up with anything particularly useful. I am thinking that we would want to add a spawn strategy which wraps the DynamicSpawnStrategy and somehow picks out the transitive closure of relevant NixToolProvider (i.e. the thing I was trying to describe in https://github.com/tweag/rules_nixpkgs/issues/180#issuecomment-1285735716). Then "all it would need to do" is, in the exec() function, before delegating to DynamicSpawnStrategy, prefix the spawn's command line with 'nix-shell -i [whatever] run -- " or something? There's a bit of handwaving in there :)

Jonpez2 commented 1 year ago

Gentle ping on this one FWIW, here is a googel groups thread on the subject - https://groups.google.com/g/bazel-discuss/c/kqv-EHhApbY

aherrmann commented 1 year ago

Sorry for the long silence on this one. I've been quite busy lately.

Do all actions happen within the context of some particular posix toolchain by any chance?

@Jonpez2 That is one of the difficulties, at least for a generic solution on the level of rules_nixpkgs, i.e. from the library author's perspective: We don't what kinds of targets users will import from Nix, and we don't know how users will use these targets. They could import toolchains, libraries, build tools, a Docker base image, etc. So, actions that use Nix provided files could be any action really. See also what @uri-canva points out in https://github.com/tweag/rules_nixpkgs/issues/180#issuecomment-1284666194.

Note there's some extra considerations if the host machine is a different platform than the remote executor. The common way of using rules_nixpkgs right now is to let nix detect the host machine platform, instead of having bazel pass that platform information in (which it can't even do for repository rules anyway, it can only do it for actions).

@uri-canva That's correct. I think handling this correctly is possible. The nixpkgs_package import could explicly set the system argument to nixpkgs and the generated toolchain could set the correct exec constraints. The assumption that host equals exec may be hard-coded in some places that would need fixing. But, I think there's no strong technical reason for that assumption, mostly just historical reasons. It's clearly related to this ticket here, but it's a separate issue. @uri-canva do you want to go ahead and open a feature request for it?

No that's definitely the approach that would be the most compatible with bazel: defining toolchains using the bazel APIs in a way such that bazel doesn't need to know anything about nix.

@uri-canva The trouble is that Nix provided dependencies can be used in actions that don't have a natural associated toolchain. E.g. a genrule or a custom action.

For example if we define toolchains with binaries that are nix-shell wrappers

@uri-canva Two things that make this problematic:

  1. Nix-shell wrappers defer the Nix-evaluation to the execution time. This means we have shifted the problem from having to ship the Nix store paths to the remote side to having to ship the Nix expressions to the remote side. I.e. the Nix sources that the Nix shell wrapper loads and evaluates have to exist on the remote executor under the correct path.
  2. Nix evaluation adds overhead, sometimes quite considerable one. The nice thing about the way rules_nixpkgs does it right now is tha this evaluation happens only once per import when fetching the nixpkgs_package. With Nix shell wrappers this would happen every time. This can be problematic. I've worked on projects in the past where this overhead made Nix shell wrappers infeasible for certain tools, e.g. gcc.

One approach that we've spiked is using nix to build portable binaries that are self contained, using buildFHSUserEnv. That's indeed a nice solution. But, as you point out has it's costs. At the level of rules_nixpkgs I'd prefer a more generic solution, if possible. I think it's worth pointing out the difference in the user and the library author perspective here: As a user with a concrete use-case in mind one can craft a dedicated solution that fits well with the given infrastructure and codebase, and one can adjust the own project around the constraints that the chosen solution implies. As the library authors of rules_nixpkgs we should strive to find a solution that doesn't unduly restrict or dictate the setup on the user. In practice we may have to be pragmatic here and there and impose some restrictions to arrive at a feasible solution, but we should try not to be overly restrictive. I feel like forcing users to turn every Nix imported package into a self-contained, relocatable package is probably too restrictive. That said, if it works at a use-site of rules_nixpkgs and fits well into a given project, sure why not.

Another point that occurred to me this morning - I think we need to define a Provider which adds nix-specific data for a rule

@Jonpez2 Keep in mind that rules_nixpkgs implements repository rules and these cannot return providers. Providers only exist at the level of regular rules. That said, if this only about transmitting metadata to dedicated rules, aspects, or other tools through some additional mechanism. Yes, that can be a viable route, rules_nixpkgs could auto-generate such metadata targets. Take a look at the good work done by @AleksanderGondek and @r2r-dev on https://github.com/tweag/nix_gazelle_extension - it generates dedicated metadata targets to transmit Nix metadata into Gazel. I think it's somewhat different from what you're point at here, but may still be a good point of reference.

Note that changes to the remote execution APIs are a bit more complex to get through, since there's several implementations of it. See https://github.com/bazelbuild/remote-apis#api-community.

@uri-canva Absolutely, I understand. The observation that makes me consider this route is that rules_nixpkgs is not alone with the problem of having to transmit information about system dependencies or other kinds of ambience. Projects that use distribution package managers could also benefit from the ability to collect system dependencies across targets and ship this metadata to the remote side. Indeed, to extend the protocol (if needed) to better support rules_nixpkgs would have much higher chances of success if it could also benefit other non-Nix related use-cases.

aherrmann commented 1 year ago

@Jonpez2 Thanks for the mailing list thread. I'll take a look. I saw Alex Eagle mentioned the community day session there. I've been meaning to share my notes here. I'll try to do so now:

BazelCon Community Day, which was held one day before BazelCon, included unconference style sessions, and we had one session on remote execution and system dependencies and Nix in particular.

Many voices suggested to find a way to track the Nix store paths with Bazel explicitly, i.e. somehow store Nix store paths under Bazel's output-base and mark them as regular dependencies such that the remote execution system would automatically push them to the remote side. The problem here is that nixpkgs assumes stable absolute paths and Bazel's action execution evironment does not provide these. So, this would require some form of path remapping, to map Nix store paths fetched into Bazel’s output base to some stable absolute path. E.g. some form of file system abstraction.

A promising suggestion came from @illicitonion. Namely, mark Nix imported targets with special platform properties that define which Nix derivation is required to run the action. Then, add a feature to Bazel to accumulate platform properties across transitive dependencies such that the transitive closure of required Nix derivations is communicated to the remote side for each action. Finally, extend the remote executor to parse these platform properties and ensure that the required Nix derivations are instantiated before running the action.

The underlying observation is that there are really two problems here:

  1. What Nix derivations does a given action depend on.
  2. How do make sure that the corresponding Nix store paths are instantiated at the remote side.

The marking and transitive accumulating of the platform properties achieves 1. Conveniently, platform properties are already shipped to the remote side, so that this doesn't require extension of the RBE protocol.

  1. then has to be addressed on the remote executor side by reading these platform properties and instantiating the corresponding store paths.

A concern that was brought up is that we may want to cut the Nix dependencies at some point, or distinguish runtime and build time deps.

Jonpez2 commented 1 year ago

@aherrmann would you be open to having a call on this? Is there a way for me to contact you directly to set one up please?

Thank you!

olebedev commented 1 year ago

For example if we define toolchains with binaries that are nix-shell wrappers

@uri-canva, please note that this won't work for every Nix package but for only the packages with executables. Nix provides some other stuff other than that, for example, we build docker images in Nix and ship them to Bazel and nix packages and form a final docker image based on the Nix one.

Allow me to put a couple of thoughts around the usability of the riles_nixpkgs in general. As far as I am aware, Bazel is primarily used for mono repositories rather than just monolithic C++/Java builds to reach incrementally. That is, for poly language setups where a Nix code base can also take place (we have a large Nix code base in our mono repo). In this case, using rules_nixpkgs becomes problematic because it's not supposed to be used in this way (support building nix packages as first-class Bazel build graph citizens, in fact they are external repositories in Bazel) and the design of the rules assumes that only a nixpkgs pin/commit is being fetched via builtins.fetchTarball and a small set of files can be attached to a declared using the rules Nix package via nix_files_deps attribute. It works ok, but we're missing dependency check capabilities from Bazel here and we need to list all these files manually, which involves code generators in the code maintenance process and destroys granularity (can be fixed by applying evaluation of every Nix package and infer it dependencies on Nix files, only). This also overloads the analysis phase because all the Nix packages are declared in the WORKSPACE file eventually.

That is, the larger a Nix codebase within a Bazel-maintained repository is the more problematic its maintenance becomes. Because it is not a part of the first-party Bazel dependencies graph but rather a part of external Bazel repositories that rules_nixpkgs create out of the Nix expressions. For example:

nixpkgs_package(
  name = "libXxf86vm",
  attribute_path = "xorg.libXxf86vm",
  nix_file = "//nix:nixpkgs.nix",
  nix_file_deps = [
    # a huge list of files that the Nix expression of the `xorg.libXxf86vm` package depends on
  ],
  repository = "@nixpkgs",
)

In light of the above, have you considered using Nix within the genrule for BRE instead of the rule set? For example, building a docker image in Nix:

genrule(
  name = "hello-world",
  srcs = ["default.nix", "//pkgs:default.nix"],
  outs = ["image"],
  cmd = """
nix-build $(location default.nix) -o "$@"
""",
  tags=["requires-network"],
)

This would work just fine with Bazel RE with a single caveat: we needs to make sure that the actions that depend on the //:hello-world target's output need to be executed only at the nodes where this //:hello-world target has been run/executed. Given the Nix binary cache, we can easily couple such tuples of target + action together, what do you think? I am not familiar with Bazel's internals but AFAIK, it can be possible to tell the Bazel build scheduler to handle tuples like this, we just need to make sure we create these tuples in terms of Bazel. What do you think about this approach?

Also, there is a great talk about making rules_nixpkgs work for BRE - https://skillsmatter.com/skillscasts/17673-remote-execution-with-rules-nixpkgs#video. From your perspective, is there something that looks more like a blocker to applying this approach? To me, it looks like quite a lot of infrastructure work needs to be done but the overall outcome would impress.

@aherrmann, @Jonpez2, @uri-canva, please let me know what do you think.

k1nkreet commented 1 year ago

@olebedev I'm experimenting with this issue and I'm trying two ideas both has a lots in common with what you are proposing. I will try to describe what am I trying and how it could be connected with your ideas, and maybe we'll be able to figure out something working from that.

So the first idea I'm playing with is to make a nix-closure of package a Bazel target like you are suggesting with docker image. I was thinking about building a tarball as a target which would be carried by Bazel as usual dependency. There is several problems with it:

  1. It would bring a lot of performance penalties and a lot of space waste
  2. The targets dependent on it should be aware about how to use this tarball, or there should be something in the middle unpacking it and generate usable targets to depend on

The second idea is to insert a dummy target like a genrule running nix-build into the dependency chain, so every target dependent on nixpkgs_package outputs would depend on this target and make it tagged as no-cachel. In this case this target will be run every time by remote executor creating items in the nix-store if they are not exists or being a no-op otherwise.

Regarding the approach you've described I have couple questions: are you proposing using this docker image as an output of nixpkgs_package instead the filegroups for example. How would it's dependencies be aware of how to utilize it? When you are proposing to ensure this genrule would be run by the remote executors why would you need to produce this docker image?

olebedev commented 1 year ago

are you proposing using this docker image as an output of nixpkgs_package instead the filegroups for example.

@k1nkreet, not really, sorry. I should have been clearer about the proposal, the example with the docker image is an example of building a Nix package using the genrule but not really about using this docker image to solve the original problem anyhow. It could be basically any Nix package, and more appropriate example would be using my previous example but implementing it as a genrule. Something like this:

# NOTE: This is a pseudo-code snippet to demonstrate the idea
genrule(
  name = "libxxf86vm",
  srcs = ["default.nix", "//pkgs:default.nix"], # properly declared dependencies, using glob, filegroups etc
  outs = ["nixpkgs-libxxf86vm"],
  cmd = """
nix-build $(location default.nix) -o "$@" -A "xorg.libXxf86vm"
""",
  tags=["requires-network"],
)

I was thinking about building a tarball as a target which would be carried by Bazel as usual dependency.

Yeah, we have discussed it today with my colleagues and we ended up with the exact same conclusion - the tarballs or docker images per Nix package/closure will bring a huge data transfer overhead and destroy the idea of incrementality/granularity per Nix package.

But the general idea of having a hook that is being executed on a particular remote executor doesn't sound invalid to me but needs elaboration.

For example, instead of having a tarball per Nix package/closure and making sure we have extracted the tarball properly, we would need to have some sort of a hook that is being executed straight before the main action command, right? If this is the case, I think, it would be way more efficient to pass over to the executor a set of the Bazel inputs for that particular Nix package and invoke something based on the nix-build CLI instead of invoking tar xf .... By that, we remove the performance penalties you have been talking about, what do you think?

In this case this target will be run every time by remote executor creating items in the nix-store if they are not exists or being a no-op otherwise.

Are you sure that would work as expected, will the Bazel scheduler pass through all the inputs that the Nix package depends on to be able to build it correctly? Meaning, would Bazel care about passing additional inputs set for the prerequisite target to be executed before the main execution. From my understanding of how the BRE works, it won't be working that way. I would be happy to be wrong and this would be a really elegant solution for the whole problem if this is the case.

Let me know what do you think.

aherrmann commented 1 year ago

I raised this issue in today’s Remote Execution API Working Group monthly meeting. Here is the outcome of that discussion:

uri-canva commented 1 year ago

The nixpkgs_package import could explicly set the system argument to nixpkgs and the generated toolchain could set the correct exec constraints.

@aherrmann I did that in my static toolchain prototype by passing nixopts = ["--argstr", "system", "x86_64-linux"] and similar to nixpkgs_package and it worked well. I will open separate issues if I see any instances of the host equals exec assumption.

The trouble is that Nix provided dependencies can be used in actions that don't have a natural associated toolchain. E.g. a genrule or a custom action.

@aherrmann Bazel is going pretty hard on toolchains, so I think we can depend on them in the general case. Yes, genrules and other very specific actions might require special case support, like configuring the right --shell_executable when launching bazel, or ensuring that the remote execution environment and the local execution environment have the exact same shell available. I think it's reasonable to solve that separately from the overall remote execution support, as I'd put it closer to making bazel itself available, and having the remote execution environment work properly with whatever bazel you have. Just as an example, if your client environment and remote environment use the same linux distro and you install bazel on both in the same way it should just work.

Two things that make this problematic:

@aherrmann Agree. We have thought / prototyped down this path more and it introduces a lot of pain points, especially if you want to avoid leaking nix specifics across the remote execution interface.

I feel like forcing users to turn every Nix imported package into a self-contained, relocatable package is probably too restrictive.

@aherrmann Yes, I was hoping there could be a more general way of making this work, like using pkgsStatic from nixpkgs, but I found it didn't quite work well in practice. I'm now looking at other ways of providing relocatable packages.

uri-canva commented 1 year ago

A feature that could remap input paths to arbitrary paths in the remote execution environment would be very welcome.

@aherrmann Could the --sandbox_add_mount_pair flag you mentioned be used for this?

uri-canva commented 1 year ago

Ah right, --sandbox_add_mount_pair wouldn't let you merge multiple derivations into a single /nix/store view, which would be helpful to support the level of granularity we'd want, especially considering rules can have multiple toolchains.

aherrmann commented 1 year ago

The nixpkgs_package import could explicly set the system argument to nixpkgs and the generated toolchain could set the correct exec constraints.

@aherrmann I did that in my static toolchain prototype by passing nixopts = ["--argstr", "system", "x86_64-linux"] and similar to nixpkgs_package and it worked well. I will open separate issues if I see any instances of the host equals exec assumption.

@uri-canva Thanks for testing! I'm glad to hear it works!

The trouble is that Nix provided dependencies can be used in actions that don't have a natural associated toolchain. E.g. a genrule or a custom action.

@aherrmann Bazel is going pretty hard on toolchains, so I think we can depend on them in the general case.

I fear I'll have to disagree on this. Yes, there is a strong push for platforms & toolchains, but, mostly for rules were the complexity is warranted, e.g. language rules like LANG_library, LANG_binary, etc. However, for simpler, or ad-hoc tasks a genrule or similar is a common choice. If you look at Alex Eagle's talk at Bazel Community Day you'll even hear it recommended to use run_binary for simple cases.

Yes, genrules and other very specific actions might require special case support, like configuring the right --shell_executable when launching bazel, or ensuring that the remote execution environment and the local execution environment have the exact same shell available.

That is a global approach though, it can work for the standard shell environment, but it's not appropriate for more special purpose tools that are only required by few rules. Because making them part of the global environment would invalidate all targets on change.

Where toolchains are appropriate it makes sense to use them. But, I think we'll always have to deal with rules or inputs that don't have a corresponding toolchain. E.g. tools for ad-hoc code generation steps, or tools to support test-cases, etc.

Two things that make this problematic:

@aherrmann Agree. We have thought / prototyped down this path more and it introduces a lot of pain points, especially if you want to avoid leaking nix specifics across the remote execution interface.

I feel like forcing users to turn every Nix imported package into a self-contained, relocatable package is probably too restrictive.

@aherrmann Yes, I was hoping there could be a more general way of making this work, like using pkgsStatic from nixpkgs, but I found it didn't quite work well in practice. I'm now looking at other ways of providing relocatable packages.

Thank you for looking into these and reporting back on your experiences! Please keep us in the loop.

sluongng commented 1 year ago

There was some support for https://github.com/tweag/rules_nixpkgs/issues/180#issuecomment-1329081409 and have the remote executor ensure that they are present before executing the action.

I think realistically, this would be the cheapest and easiest-to-implement solution.

Most RBE vendor today has already built some support for Container Image for special actions. I can see how similar support could be provided for Nix.


Long-term wise, I think Remote API is not the right place to embed Nix logic. As folks from the meeting have highlighted, the API currently supports a custom root tree for the action, which is a very coarse, but sufficient level of support encompassing Nix use cases.

I think it would be nice if Bazel itself could provide some sort of customization to actions. Perhaps some sort of hooks to prepare/clean-up the execution sandbox prior to each action run. Which could be transparently turned into a wrapper running commands before/after each action.

uri-canva commented 1 year ago

That is a global approach though, it can work for the standard shell environment, but it's not appropriate for more special purpose tools that are only required by few rules. Because making them part of the global environment would invalidate all targets on change.

Where toolchains are appropriate it makes sense to use them. But, I think we'll always have to deal with rules or inputs that don't have a corresponding toolchain. E.g. tools for ad-hoc code generation steps, or tools to support test-cases, etc.

@aherrmann Yeah you're right. Let me summarise some related points to make sure I understand some assumptions we're working with.

There are 3 ways of getting binaries into your bazel build:

  1. Adding them to your PATH. This could be by installing them system wide or in a nix shell, and by letting bazel read the path or by passing it explicitly with --action_env. Bazel will not track these as inputs, so in the context of remote execution they're expected to be in the path of the remote execution environment, and keeping the binaries compatible between the client and remote environment is done completely outside of bazel. Commonly used by rule sets to work out of the box without additional configuration, but many rule sets have configuration available to avoid this by referring to a prebuilt toolchain to download, or even default to downloading a prebuilt toolchain. A special case of this is bazel itself, as it expects to find some binaries on the path or in specific directories assumed to contain them or links to them regardless of the path.
  2. Referencing the binary file itself in bazel. For example by using file labels to reference some script checked into the source, or using new_local_repository to reference files anywhere on the filesystem. Bazel will track these as inputs, but only the file themselves and not any dynamic libraries or other resources they might reference. Making sure those dynamic libraries and resources are compatible between client and remote environment is done outside bazel. Used mostly when binaries are executed from repository rule contexts since it's in the analysis phase so binary targets aren't available.
  3. Creating binary targets for them. These can be defined for binaries built from source, but they might also be prebuilt, either vendored into the source or downloaded from repository rules. Either way the binary targets should be configured with all the required dependencies in their runfiles (to a certain degree, most of the time a complete macOS installation is assumed when under darwin, and a relatively complete FHS distribution is assumed when under linux, and I assume similar expectations are in place when running under windows). This is the most common way of defining toolchain binaries. Because bazel is aware of all the runtime dependencies it can ensure the client and execution environment for the binaries are compatible.

In the context of rules_nixpkgs, I believe we're only interested in 3. While it's true that 1 and 2 are used widely, especially with genrules and ad hoc rules, managing those binaries falls outside of bazel, and thus outside of rules_nixpkgs. While it's true that you can use nix for 1 and 2, I don't think rules_nixpkgs should have any particular support for that: if you want rules_nixpkgs to handle your nix dependencies in bazel, you have to declare appropriate nixpkgs_package rules.

At the moment the targets those nixpkgs_package rules generate are not correct, as they don't declare all their runtime dependencies, the generated targets just glob the files. Even if they did declare all their dependencies, and bazel tracked them as inputs, they still wouldn't be correct from a bazel perspective, because the absolute paths baked into the files would lead to them referencing the files outside of the bazel sandbox.

Since the targets aren't defined correctly from bazel's point of view, they do not work with remote execution.

uri-canva commented 1 year ago

Given the assumptions in my previous comment it should now be clear why I opted for static linking for my prototype: by creating a statically linked binary from nix, I could very easily define a target for it that was correct from bazel's point of view:

`nix_python.bzl`: ``` nixpkgs_package( name = "python3_linux_x86_64", attribute_path = "python39Portable", build_file_content = """ package(default_visibility = ["//visibility:public"]) load("@rules_python//python:defs.bzl", "py_runtime", "py_runtime_pair") py_runtime( name = "py3_runtime", files = glob(["**/*"], exclude=["**/* *.*"]), interpreter = "bin/python3.9", python_version = "PY3", ) py_runtime_pair( name = "runtime_pair", py2_runtime = None, py3_runtime = ":py3_runtime", ) toolchain( name = "toolchain", exec_compatible_with = [ "@platforms//os:linux", "@platforms//cpu:x86_64", ], target_compatible_with = [ "@platforms//os:linux", "@platforms//cpu:x86_64", ], toolchain = ":runtime_pair", toolchain_type = "@bazel_tools//tools/python:toolchain_type", ) """, nix_file = "//tools/nix/pkgs:default.nix", nix_file_deps = nix_files, nixopts = ["--argstr", "system", "x86_64-linux"], repository = "@nixpkgs", ) ``` `python39portable.nix` (`runInFHSUserEnv` is from https://discourse.nixos.org/t/derivation-that-builds-standard-linux-binaries/21557/6): ``` { lib , stdenv , fetchurl , runInFHSUserEnv , python }: let drv = stdenv.mkDerivation rec { name = python.name; src = python.src; builder = ./builder.sh; CFLAGS="-idirafter /usr/include"; LDFLAGS="-L/usr/lib -B/usr/lib -dynamic-linker /lib64/ld-linux-x86-64.so.2"; }; in runInFHSUserEnv drv { extraOutputsToInstall = [ "include" ]; targetPkgs = pkgs: (with pkgs; [ pkgs.autoconf-archive pkgs.binutils-unwrapped pkgs.bzip2.dev pkgs.expat.dev pkgs.gcc-unwrapped pkgs.gdbm pkgs.glibc.dev pkgs.gnumake pkgs.libffi.dev pkgs.ncurses.dev pkgs.openssl_1_1.dev pkgs.patchelf pkgs.pkg-config pkgs.python3 pkgs.readline.dev pkgs.sqlite.dev pkgs.xz.dev pkgs.zlib.dev ]); } ``` `builder.sh`: ``` tar xf $src cd Python-* ./configure --enable-optimizations --prefix="" make export DESTDIR=$out make install patchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 $out/bin/python3.9 ```

The key idea here is not necessarily the static linking, it's to have the target be correct, and define all the inputs. As mentioned above I will try to make this work without having to write your own derivations and builder scripts for each binary, though as you can see given you can use nixpkgs for all the dependencies, it isn't a lot of work.

In principle I like the idea of defining correct targets very much, as it's very idiomatic from the bazel point of view, which helps people who use bazel but might not be familiar with nix work with it and debug it. It also reuses the existing code paths, which makes it fully supported now, but also helps with picking up any future improvements and maintenance.

uri-canva commented 1 year ago

to a certain degree, most of the time a complete macOS installation is assumed when under darwin, and a relatively complete FHS distribution is assumed when under linux, and I assume similar expectations are in place when running under windows

Note that this is only necessary if you care about having a runtime interface at that level. Ultimately you need to define something about the environment your software runs in, even if it's something as barebones as the machine architecture or the kernel. Our interface is something similar to https://github.com/GoogleContainerTools/distroless/tree/42ec9e98b4eb48fac18ccde3251f757bae2c41e4/base, and I think most remote execution environments will work with FHS linux out of the box, so I think it's worth focusing on that first, but others might have a NixOS based runtime environment, if that is common amongst users of rules_nixpkgs we'll have to look at supporting that too.

It's also possible to make the interface just the container runtime, and inline a container in the build, but that gets a bit tricky because you'd have to verify the container is available in the registry on every build and push it if it's not, or similar stuff.

Jonpez2 commented 1 year ago

Re static binaries - one thing we should keep in mind is generation of container images from normal bazel rules which depend on nix derivations - we would, I assume, want to use nix-native code to do all the appropriate symlinking, env vars, and whatever else is required? So I think an approach that cuts out nix metadata would preclude this?

A motivating example might be a library which wraps up invocations of kubectl with the gke-gcloud-auth-plugin all correctly setup. Then we take that as a dependency and invoke a rules_docker (or maybe better a rules_nix_containers!) to build a container image. I think we could make that work only if we retain nix metadata, right? Maybe the same question applies to a java_binary that takes that lib as an input too.

Thank you for all of the work on this, it’s so exciting to see this move forward. It will be such a leap forward for trivial bazel hermeticity.

On Sat, 14 Jan 2023 at 00:34, Uri Baghin @.***> wrote:

Given the assumptions in my previous comment it should now be clear why I opted for static linking for my prototype: by creating a statically linked binary from nix, I could very easily define a target for it that was correct from bazel's point of view:

nix_python.bzl:

nixpkgs_package(
    name = "python3_linux_x86_64",
    attribute_path = "python39Portable",
    build_file_content = """

package(default_visibility = ["//visibility:public"]) @.*_python//python:defs.bzl", "py_runtime", "py_runtime_pair") py_runtime( name = "py3_runtime", files = glob(["*/"], exclude=["/ .*"]), interpreter = "bin/python3.9", python_version = "PY3", ) py_runtime_pair( name = "runtime_pair", py2_runtime = None, py3_runtime = ":py3_runtime", ) toolchain( name = "toolchain", exec_compatible_with = [ @.//os:linux", @.//cpu:x86_64", ], target_compatible_with = [ @.//os:linux", @.//cpu:x86_64", ], toolchain = ":runtime_pair", toolchain_type = @._tools//tools/python:toolchain_type", ) """, nix_file = "//tools/nix/pkgs:default.nix", nix_file_deps = nix_files, nixopts = ["--argstr", "system", "x86_64-linux"], repository = @.", )

python39portable.nix (runInFHSUserEnv is from https://discourse.nixos.org/t/derivation-that-builds-standard-linux-binaries/21557/6 ):

{ lib , stdenv , fetchurl , runInFHSUserEnv , python }:

let drv = stdenv.mkDerivation rec { name = python.name; src = python.src; builder = ./builder.sh; CFLAGS="-idirafter /usr/include"; LDFLAGS="-L/usr/lib -B/usr/lib -dynamic-linker /lib64/ld-linux-x86-64.so.2"; }; in runInFHSUserEnv drv { extraOutputsToInstall = [ "include" ]; targetPkgs = pkgs: (with pkgs; [ pkgs.autoconf-archive pkgs.binutils-unwrapped pkgs.bzip2.dev pkgs.expat.dev pkgs.gcc-unwrapped pkgs.gdbm pkgs.glibc.dev pkgs.gnumake pkgs.libffi.dev pkgs.ncurses.dev pkgs.openssl_1_1.dev pkgs.patchelf pkgs.pkg-config pkgs.python3 pkgs.readline.dev pkgs.sqlite.dev pkgs.xz.dev pkgs.zlib.dev ]); }

builder.sh:

tar xf $src cd Python-* ./configure --enable-optimizations --prefix="" make export DESTDIR=$out make install patchelf --set-interpreter /lib64/ld-linux-x86-64.so.2 $out/bin/python3.9

In principle I like the idea of defining correct targets very much, so as mentioned above I will try to make this work without having to write your own derivations and builder scripts for each binary, though as you can see given you can use nixpkgs for all the dependencies, it isn't a lot of work.

— Reply to this email directly, view it on GitHub https://github.com/tweag/rules_nixpkgs/issues/180#issuecomment-1382607754, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABN425I7B6EPZAAPG7YZY6TWSHYA5ANCNFSM5M3AOKUA . You are receiving this because you were mentioned.Message ID: @.***>

olebedev commented 1 year ago

but that gets a bit tricky because you'd have to verify the container is available in the registry on every build and push it if it's not, or similar stuff.

@uri-canva, this look similar to the existing problem with verifying if Nix artifacts have been populated to the /nix/store and perform some actions if they're not.

It's also possible to make the interface just the container runtime, and inline a container in the build

Sorry for potentially silly question, I am not super familiar with the Bazel's internals and the Bazel's remote execution API, please bear with me. @uri-canva, if we go that path and build containers inline, do we really need that roundtrip with pushing these containers to registry and then pull them back in on each remote executor machine, meaning, would it be more appropriate to remove this registry part and let BRE API work out how these inline built images delivered to the remote execution machine?

uri-canva commented 1 year ago

@olebedev From what I can tell (https://github.com/bazelbuild/remote-apis/blob/3a21deee813d0b98aaeef9737c720e509e10dc8b/build/bazel/remote/execution/v2/remote_execution.proto#L690) platform properties are just strings, they're not values with associated digests that can be uploaded as blobs or anything like that. Compare that with the input root that is a digest pointing to a directory: https://github.com/bazelbuild/remote-apis/blob/3a21deee813d0b98aaeef9737c720e509e10dc8b/build/bazel/remote/execution/v2/remote_execution.proto#L470.

aherrmann commented 1 year ago

There are 3 ways of getting binaries into your bazel build: [...] In the context of rules_nixpkgs, I believe we're only interested in 3.

@uri-canva I'd distinguish another way in this case: "Hard-coding an absolute path to an external binary". That is what rules_nixpkgs effectively does, either through a symlink (bin/...) that is then tracked as a File in Bazel, or through a string, e.g. in the cc toolchain configuration. (Similarly, on Windows, one may create shims.)

At the moment the targets those nixpkgs_package rules generate are not correct, as they don't declare all their runtime dependencies, the generated targets just glob the files.

Yes, in the sense that they don't expose all needed runtime dependencies as artifacts to Bazel.

Unfortunately, there are technical reasons why it's not always feasible to expose all needed files as artifacts to Bazel. E.g. tracking a full Python toolchain, even just as symlinks into the Nix store, will quickly use up too much disk. For large sets of inputs there is also considerable overhead in input hashing, see here.

because the absolute paths baked into the files would lead to them referencing the files outside of the bazel sandbox.

Correct.

Since the targets aren't defined correctly from bazel's point of view, they do not work with remote execution.

Yes, put this way, this also hold for all other kinds of global dependencies, e.g. /usr/bin/bash, or /usr/lib64/ld-linux-x86-64.so.2. Of course the difference being that these are commonly installed on the remote executors as well.

In principle I like the idea of defining correct targets very much, as it's very idiomatic from the bazel point of view, which helps people who use bazel but might not be familiar with nix work with it and debug it.

In principle I agree. As mentioned above, unfortunately there are technical limitations in Bazel that make this not always feasible. For the case of "correct" meaning all runtime dependencies explicitly tracked as files by Bazel.

k1nkreet commented 1 year ago

I want to share the experience I got when I was experimenting with this hacky little idea I've described above, in case someone would find it useful or maybe would want to experiment with it. Briefly, the idea was to create an ordinary rule which runs nix-build and specifies the outputs which could be used as inputs by ordinary Bazel rules like cc_library for library objects and headers or toolchains for binary outputs. This nix-build action should be marked as no-cache and no-sandbox and should be instantiated for every separate group of outputs which is could be used as a separate inputs for other rules. This approach tries to insert nix-build execution into the dependency chain securing the existence of store paths relying on the Nix cache which allow us call to build the same derivation multiple times without a huge penalty. It also implies all remote builders are sharing the same Nix store.

For example if I want to build zlib C-library it should end up as two nix_build rule instantiations: one to produce shared objects and one to produce headers. Those outputs could be used as a srcs and hdrs of cc_library respectively. It turned out to be working with one adjustment: these rules have to specify all the outputs before-hand, and I've decided I can use nixpkgs_package repository_rule for this. It can run nix-build as it does now, collect the outputs and produce proper nix_build rules.

Nevertheless, there is several problems which forces me to stop this experiment:

sluongng commented 1 year ago

This approach tries to insert nix-build execution into the dependency chain securing the existence of store paths relying on the Nix cache which allow us call to build the same derivation multiple times without a huge penalty. It also implies all remote builders are sharing the same Nix store.

This means the store would be uploaded from your local laptop to Remote Cache CAS, and then fetched by remote executor workers as subsequent action inputs right? Does that mean there is a potential where the cache may differ between different users?

uri-canva commented 1 year ago

I'd distinguish another way in this case: "Hard-coding an absolute path to an external binary".

@aherrmann Yeah you're right, that behaves a lot like 2, and it's especially clear if you use shims: the binary is the shim, and bazel doesn't know anything about the shim's runtime dependencies.

k1nkreet commented 1 year ago

This means the store would be uploaded from your local laptop to Remote Cache CAS, and then fetched by remote executor workers as subsequent action inputs right? Does that mean there is a potential where the cache may differ between different users?

No since this rule is marked as no-cache it will never upload anything into Remote CAS. Remote workers will be forced to re-run it which will secure existence of nix store paths on the remote side

olebedev commented 1 year ago

No since this rule is marked as no-cache it will never upload anything into Remote CAS. Remote workers will be forced to re-run it which will secure existence of nix store paths on the remote side

@k1nkreet, wouldn't it be executed only on the host (the build host machine) as you assign no-remote tag to the target?

k1nkreet commented 1 year ago

@olebedev Oh sorry, this was a typo, I was going to write no-sandbox. Thanks for noticing!

uri-canva commented 1 year ago

Re static binaries - one thing we should keep in mind is generation of container images from normal bazel rules which depend on nix derivations - we would, I assume, want to use nix-native code to do all the appropriate symlinking, env vars, and whatever else is required? So I think an approach that cuts out nix metadata would preclude this?

@Jonpez2 for something like that it might make more sense to build the container image with nix, and use buildBazelPackage to build the bazel target that you want to include in the container image, that way you can pass the nix dependencies through to the image more easily. As you have noted any approach that doesn't forward nix metadata through bazel and out the other side wouldn't work.

uri-canva commented 1 year ago

A little update on my exploration of the 3 approach above, defining toolchains in a way that bazel can handle everything without any knowledge of nix.

After the static approach I tried a rewriting approach, by rewriting all the references to the nix store to be relative paths from the execution root instead (https://github.com/uri-canva/rules_nixpkgs/tree/rewriting). It worked a lot better than I had expected, but even before completing the prototype I hit two major issues:

  1. The issue with excessive inputs @aherrmann mentioned above. In the static prototype since the interpreter was a single file it wasn't as noticeable, but with the rewriting the inputs took a long time to read in and hash.
  2. The static prototype used runInFHSUserEnv, which resulted in a toolchain that targeted linux FHS. The rewriting resulted in a toolchain that targeted the bazel execroot. So the outputs could only run within bazel.

I hadn't realised it initially, but nix and the bazel execroot / runfiles are both features of how the dependencies are packaged that need to be taken into consideration for the target platform, not just the exec. The most common bazel remote execution setup doesn't just assume exec to be linux FHS with glibc, but the target platform as well. The target platform might seem easy to change if you target containers but things get a bit more complicated if you want to use prebuilt artifacts from common language package manager repositories like pypi wheels and maven jars.

This is quite hard to do with nixpkgs because most packages related to building in it assume you're going to be building within nixpkgs, or at the very least targeting a nix system (see https://github.com/NixOS/nixpkgs/issues/185742).

I've posted a new issue in nixpkgs to see if redistributable packages is of interest: https://github.com/NixOS/nixpkgs/issues/214832, this could be a potential solution for exec platform not being a nix platform, or at least not a nix platform that is provisioned in the same way as the host platform.

Given the latest developments I think I'm a bit stuck on a solution that would define a bazel idiomatic toolchain using nixpkgs.