thoughtpolice / buck2-nix

Do not taunt happy fun ball
59 stars 4 forks source link

fix(types): Use new buck2 types #21

Closed mayl closed 6 months ago

mayl commented 7 months ago

Summary:

Summary

Buck2 seems to have moved away from stringified types and now needs types as defined here.

Test Plan

buck build ... fails with type error on previous commits. Succeeds on this commit.

Things done
mayl commented 7 months ago

Just a couple notes.

1) I've read the disclaimer at the top of your README, and I'll have no feelings hurt if you don't want this PR. 2) I'm completely new to buck2, so the extent of this change was following buck build error messages and applying what seemed like the appropriate type, I wouldn't take it for granted that this is correct... But I thought I could share it since at least I can now buck build ... successfully. 3) I noticed that the renode targets (and by extension the hifive ones) seem to not run for me (or i don't understand how to run them).

Thanks for sharing this project!

thoughtpolice commented 6 months ago

I'm not actively maintaining this repo right now; I actually have a (currently private) fork of it that's been expanded a bit, but has fewer explict Nix dependencies. I'm glad you found this useful, so I'll go ahead and merge this to unblock the repository and CI system.

I can't remember precisely what the deal with the Renode target is. I think the problem is that testing infrastructure for Renode wants to write into the renode libexec dir to write some temporary files, but that location exists in the /nix/store, so it fails instantly. I couldn't quite figure out how to fix this OTTOMH, making the Renode build a bit useless, but I tried a lot IIRC. :(

mayl commented 6 months ago

I'll be interested to see your adjusted structure if you are ever in a position to share it.

One thing I noticed is the cxx rules as written here do not build the example cpp project from the buck2 getting started tutorial (they #include <iostream>). Adding a toolchain to use the clang++ binary out of clang-stable fixed that very simple case. Curious to hear if you had suggestions for more general fixes from your experience in your private fork? Are you still building a new prelude from scratch?

That Renode diagnosis sounds correct, or at least close (I see it fails early with something like "no such file exists"). I was highlighting it in case I had inadvertently broken those targets in my mucking around with the types, but sounds like that's preexisting.

thoughtpolice commented 6 months ago

Yes, it is still a fully home-grown prelude from scratch, and it is fairly heavily expanded. And one of the things it has is an actual workable set of (fundamental) CXX rules, that also work (and I test exclusively) through remote execution. However, I'm not done with any other languages yet, and I'm currently trying to expand them to include some advanced features like Distributed ThinLTO, Propeller PGO, etc. So, things like iostream aren't a problem, nor are things like having .c and .cxx files next to each other and picking the right compiler, etc. I actually have a bunch of various projects building cleanly with it that I ported to BUILD files (mimalloc, liburing, librseq, Trealla Prolog, Radamsa, and a few other things.)

The reason for my own prelude is mostly because the Meta prelude is extremely large and expansive, and I can't realistically audit it very well or track and fix bugs. It also tries to maintain compatibility with buck1 rules, which heavily constrains the APIs (and the kind of BUILD files you write if you're trying to maintain parity between the two). This doesn't mean the upstream Prelude is bad — it's probably better for most people in fact, even if it's got some warts. For example, it's actually quite good for Rust code in my experience, when combined with Reindeer. Rust is next on my list of languages to support, but I have far less experience with the toolchain, so it's even more work.

The primary goal is a Prelude that:

So I have had to fundamentally remove many of the Nix-isms that are in this repository it and make sure many of the tools work reliably everywhere, too. So, it has a fundamentally different approach to some things, where e.g. all of the "tooling wrappers" that might be in Python (e.x. rustc-wrapper) in buck2-prelude are actually written using Babashka, and things like that.

I do plan on releasing this Prelude, along with the accompanying project it's supporting, but I'm just cranking away at it in private right now, since I've had to rework it a bunch. I've probably rebased/rewritten the whole commit history about 2 times, now. I also plan on making a "side project" that is just a (automatically exported) copy of the Prelude code that can be used by third-party projects, and is detached. Though again, I suspect the Meta prelude is probably better for most people in the near future, even if it's a bit intimidating.

thoughtpolice commented 6 months ago

And to be clear, I do support and use Nix with this new Prelude on Linux. But buck2 no longer drives Nix and run nix commands itself; instead it simply is expected that Nix provides the toolchains that can be invoked by Buck2, etc. As time goes on, I expect the usage of Nix for this to minimize, since it comes with some complications around Remote Execution and symlink support. But Nix will probably always be a really good way to do things like provide buck2 binaries, extra associated developer tooling, etc.

mayl commented 6 months ago

Thanks for the rundown, I'll definitely keep an eye out for your new prelude!

I'm not sure I fully understand, but it sounds like the nix support you are describing is roughly the equivalent of running buck2 in a devshell pre-populated with the toolchains (or a subset that is sufficient to let buck2 bootstrap subsequent toolchains). Is that right? Part of my interest is definitely using nix to provide pinned toolchains, but sounds like maybe you moved away from that for cross platform/remote execution reasons? If nix turned into an impediment, that's a little disappointing to hear since it seems like in theory a nix expression + cache would've been a great fit for both those goals...

Also, was one of your goals ever to allow producing buck builds from inside a nix sandbox? If so do you have any thoughts on how that might work? I guess in theory if buck is reproducable, fod should work...

thoughtpolice commented 6 months ago

I'm not sure I fully understand, but it sounds like the nix support you are describing is roughly the equivalent of running buck2 in a devshell pre-populated with the toolchains (or a subset that is sufficient to let buck2 bootstrap subsequent toolchains). Is that right?

Right, that's about correct.

Part of my interest is definitely using nix to provide pinned toolchains, but sounds like maybe you moved away from that for cross platform/remote execution reasons?

So this repository basically tried two experiments:

The first thing is a good idea that's workable. The second one, currently, is not, except under specific circumstances. These statements hinge on the assumption you want Remote Execution, which is a huge, huge boon.

For the first thing: all you need to do is produce a container image that has a consistent set of tools between your devShell and your remote execution system. Configure your RE system to use that container to execute commands. So you can already do all this with Nix today, in fact, using an array of solutions like dockerTools.buildLayeredImage; just abstract out your buildInputs in the devShell, share them, and build a container with the proper setup, deploy it, etc. Your rules can then rely not only on these tools being available but exact nix paths being available in /nix/store. So that's convenient, and hermetic.

You can do lock-step upgrades this way too, but it's kind of roundabout because you need the CI system to be updated, now. So let's say you want to do flake update to update your tools. Your existing container is devcontainer:v1. You'd have to do something like make a preparatory PR to bump the Flake inputs, then produce a Docker image from that, called devcontainer:v2, deploy it in a registry and RBE. Give it this container a name like devcontainer-2 in the RBE system. Then add another change to the PR which will point the buck2 platforms to use the new, correct RBE properties, i.e. those properties will now point to devcontainer-2. These need to be in the same commit, so that when your developers git pull and get a new flake.lock, they will also get a matching setup where the RBE system will route to the v2 container, not the v1 container. So it's a weird lockstep build process. Complicated, but not unworkable.

The other issue there is that Nixpkgs packages tend to be very "rich" and featureful, so the containers are easily bloated to 2GiB+ sizes in a heartbeat. Annoying but not unworkable and has solutions available, since you can override your tools to make them slimmer.

This first approach is totally viable, today, assuming you know your way around Nix and can smooth off some of the sharp edges. You can definitely ship toolchains into the RBE system as part of a container, and let local developers iterate with a devshell.

The biggest problem is the second case, i.e. rules that invoke nix commands; Buck2 doesn't really support Symlinks right now, but that's one of the best ways to get ahold of the tools you build with expressions like nix build nixpkgs#foo --out-link buck-out/.... Then you can just look at the bin/ directory inside the symlink to get binaries or whatever. That's how the renode stuff works, as you can tell. But that's not a good idea because those actions can only ever be local_only, because they modify the ambient system in order to function.

OK, so you could put Nix inside the containers on the RBE side, so that nix commands can be invoked on the RBE container, but that's also really complicated for reasons:

  1. It does not have the Nix sandbox enabled, and
  2. The RBE cache and the /nix/store are totally disconnected from each other, and because of that
  3. The container has to have network access.

Imagine you have multiple containers waiting on the RBE side to accept jobs. Two of them may get actions to invoke, which are something like nix build foo. They cannot share the work of evaluating and materializing foo, because these containers do not have a shared /nix/store. That means each new, fresh container has to pay the cost of every /nix command being invoked on it, dramatically bloating the containers and making a lot of things worse. Imagine if each of these nix build command pulled down 500MiB of deps; that means you'd have to pay 500MiB on every new container, just for that one command. This gets bad quickly. It also doesn't work if your containers are designed to be fully immutable and stateless. It pretty severely nerfs one of Buck2's greatest strengths over Nix, which is the far more fine-grained caching and build graph.

My only conclusion from going down this road is that the best avenue is to have an RBE system that has a unified Bazel/Buck2 and Nix cache. That is, the cache has knowledge of and can store both /nix/store artifacts and it can store RBE artifacts from systems like Buck2, seamlessly. You could then for example have your RBE system start every container with a "virtualized" /nix/store, maybe with something like FUSE, to be shared and talk to the CAS system. So, a container makes an effort to look for /nix/store/abcdef-foo and then the FUSE layer can intercept this, and download the contents from the storage system. This would allow you to remove the need for networking in the container, and improve usability dramatically because each container doesn't have to redo the work.

I think that's a viable system, by the way. But it's also a huge amount of work and basically akin to building a completely custom RBE system. So it's not something I've worked on.

Related: Dotslash, an recent open-source (and very neat) project from Meta, has this exact same problem as they have described here. In short, Dotslash is a tool that manages its own content-addressed filesystem cache via downloads, independently of Buck2's cache. They don't want their RBE system to have networking. The solution? Their internal fork of Dotslash actually has the ability to store its artifacts in the RBE cache, not the filesystem.

Also, was one of your goals ever to allow producing buck builds from inside a nix sandbox? If so do you have any thoughts on how that might work? I guess in theory if buck is reproducable, fod should work...

No. I think the biggest problem, really, is download_file and other related functionality. Buck2 really wants to download the artifacts it needs, just like Nix does, for similar reasons.

I think you'd probably have to do something like implement "dir cache", a feature from Buck1, into Buck2, where the build outputs are all put in a shared directory. Then, you could use buck2 cquery in an impure derivation to set up the build by finding and running the impure download_file functions (and other stuff), since network access is allowed in an impure derivation. These outputs will get put in the dir cache, which is the output of the impuer derivation. Then, use that as the input to an FOD that ensures it's "purified". Finally, you then do the real build, and point Buck2 to use the FOD as the dir cache, so it doesn't re-download anything, and you can work with networking off. I use Impure + FODs in a separate project here to download machine learning model data in an impure stage, then purify it with an FOD, which is fed to another derivation for its data directory.

I hope all of this makes sense and answers your questions.

mayl commented 6 months ago

I've had a couple long replies with somewhat ill-formed questions written out as I've been pondering your above points but I think basically I'm probably not familiar enough with the intricacies of RBE and Buck to to fully grok all the issues here so I'll hold off on most of those for now... Mostly I just want to reply to say I appreciate your thoughts, and sharing your experience, there's obviously a lot of possible ways to merge Nix and Buck but also a lot of potential dead ends that you seem to have some good insight into...

Below is a rough sketch of something I'm just starting to think through based on your input above. Curious what your reaction is if any. 1) Enumerate all the toolchains in the nix flake, and have an output that can be built which spits out all the toolchain names->store paths in JSON or something. 2) One buck rule runs nix to build this JSON file, which results in all the paths also being built (using the --store option so it doesn't need to modify the system). If you want to do this without full network access, you could set up a remote builder that the worker sends this job to over SSH. Connect all workers and the remote builder to the same private binary cache, and ensure all these nix outputs are cached there. 3) Subsequent Buck rules that use a Nix provided toolchain depend on the JSON file, and produce the toolchains locally by nix-store --realize <path> --store <local build store>. The output of this rule is the local store path (in buck-out or whatever is conventional), and the input dependency is the JSON file, so this should(?) be pretty fast and fine-grained for buck? The only network access for this would be to the private nix binary cache.

Disadvantages I can see seem to be: 1) You don't get to shield users from nix completely, when adding toolchains you need to edit the flake. 2) The JSON flake build thing is kind of a chokepoint - basically after a change to flake.nix or to flake.lock all the toolchains have to get materialized before any other work can take place (not sure how bad this is in practice). 3) You need to set up the binary cache infrastructure, but that could be useful for local development too...

I use Impure + FODs in a separate project here to download machine learning model data in an impure stage, then purify it with an FOD, which is fed to another derivation for its data directory.

I know this is kind of ancillary to the main buck2/nix thread of this discussion, but I'm not sure I'm following why this two stage build is necessary. FOD derivations can have network access and be pure, because they must result in a known output hash (at least that's how I've used them in the past...).