tweag / rules_nixpkgs

Rules for importing Nixpkgs packages into Bazel.
Apache License 2.0
293 stars 80 forks source link

Add a `nixpkgs_packageset` rule #49

Open thufschmitt opened 5 years ago

thufschmitt commented 5 years ago

Since https://github.com/tweag/rules_haskell/pull/442, the evaluation of the workspace of a rules_haskell-enabled repository takes an insane amount of time because it requires nix-build-ing every transitive haskell dependency, which easily means more than 200 nix-build calls. We could speed things a lot by first nix-instantiate-ing a big json file mapping each haskell package to its .drv, and then just realize as needed these pre-intstantiated derivations. This would avoid the cost of evaluating nixpkgs 200 times (which is by far the bottleneck when everything is already built).

This trick can probably be used for more than just the haskellPackages case, so I propose we add a nixpkgs_packages_set rule to rules_nixpkgs which would be used like:

nixpkgs_packages_set(
  name = "haskellPackages",
  repository = "@nixpkgs",
  base_attribute_path = "haskellPackages",
)

This would generate @haskellPackages-base, @haskellPackages-streaming, @haskellPackages-foobar, … (and possibly one @haskellPackages with aliases in it for all of these.

Internally this rule would

  1. nix-instantiate a json file of the form

    {
      "base": "/nix/store/…-base.drv",
      "streaming": "/nix/store/…-streaming.drv",
      "foobar": "/nix/store/…-foobar.drv",
      …
    }
  2. Load this into a starlark dict one way or another

  3. For each element of this dict, generate a call to a rule which would nix-store --realize the corresponding derivation

All this would require extending a bit the scope of nixpkgs (or doing thing in an ad-hoc way, not sure what's the best choice), to add

Thoughs?

mboes commented 5 years ago

I think this is a good idea. Though to be useful the json file you mention should be checked into the source, right? So that even the initial checkout isn't too slow? @zimbatm pointed out to me yesterday that this would be similar to Yarn lock files. And it so happens that @aehlig proposed to do something similar here and here.

thufschmitt commented 5 years ago

I didn't think of checking it. I think it would be a benefit even without doing it.

Some really quick tests I've done indicate that for haskellPackages, this approach starts being faster than nix-build-ing each one individually with 30 packages and is almost 6x faster with 200 packages (on my machine, nix-build handles roughly 100 packages/min while nix-instantiate takes ~15s and the calls to nix-store --realise are negligible).

Checking that into the repo would make it almost instantaneous, at the cost of having to handle a generated file.

zimbatm commented 5 years ago

@regnat how many calls to nix-instantiate are being made? Theoretically you could get all your attributes in one call by concatenating the -A attrpath on a single nix-instantiate.

Pardon my bazel:

nixpkgs_instantiate(
  name = "myPackages",
  repository = "@nixpkgs",
  attributes = [
    "haskellPackages.ghc",
    "hello"
  ]
)

Then I imagine that this rule would have multiple outputs, one for each attribute.

This approach should be faster even on a fresh install with anything > 2 packages.

Profpatsch commented 5 years ago

Theoretically you could get all your attributes in one call by concatenating the -A attrpath on a single nix-instantiate.

Can we assume the output is stable and is going to stay so over nix releases?

$ nix-instantiate -A hello -A binutils -A ghc
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/258wysx7s44xf90i2ww5h54h8745blym-hello-2.10.drv
/nix/store/s4n633q0lmqm70f22k3chp8kkn4nsql9-binutils-wrapper-2.30.drv
/nix/store/12bkksv14ns4p1xga5vw7wkvpj9kmzvn-ghc-8.4.4.drv

$ nix-instantiate -A ghc -A hello -A binutils
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/12bkksv14ns4p1xga5vw7wkvpj9kmzvn-ghc-8.4.4.drv
/nix/store/258wysx7s44xf90i2ww5h54h8745blym-hello-2.10.drv
/nix/store/s4n633q0lmqm70f22k3chp8kkn4nsql9-binutils-wrapper-2.30.drv

I’d go a step further and make this possible:

foo.nix

with import ./nixpkgs {};
{ 
  # attrset of packages
  pythonPackages = { inherit (pythonPackages) a b c d e; };
  # one derivation
  bundledGhc = haskellPackages.ghcWithPackages (h: with h; [ lens aeson ]);
}

WORKSPACE

nixpkgs_repository_cache(
  name = "foo_cache",
  repository = ":foo.nix",
  cached_attributes = [
    # can reference the attribute set here
    "pythonPackages",
    "bundledGhc",
  ],
)

nixpkgs_package(
  name = "python-a"
  # every .drv from the attribute set is accessible from the cache
  attribute_path = "pythonPackages.a"
)

So users can reference whole attribute sets (recursively?) and it will cache all subderivations. We can also follow hydra and only recurse into attribute sets marked with recurseIntoAttrs. The code generating the list of all attributes can be written in nix and output with nix-instantiate --eval. If you want I can whip it up.

zimbatm commented 5 years ago

Sounds great!

I think the nixpkgs_package would have to take a cache name as input to establish the link(?)

You probably know already, nix-instantiate has a --json output that might come handy.

--- EDIT ---

Oops forgot to answer:

Can we assume the output is stable and is going to stay so over nix releases?

Since nix doesn't give any other output mapping capability than the ordering I would be the first to complain if that started to break.

thufschmitt commented 5 years ago

how many calls to nix-instantiate are being made?

Only one: I nix-instantiate --json --eval --strict --read-write-mode haskellPackagesToJson.nix where haskellPackagesToJson.nix contains

with import <nixpkgs> {};

let
  evaluateElement = x:
    let result = builtins.tryEval (x.drvPath or null); in
    if result.success == true then result.value else null;
in
(builtins.mapAttrs (_: evaluateElement) haskellPackages)

@Profpatsch If I understand correctly you suggest saving the .drvs for the whole of nixpkgs at once? I've considered that, but since just doing so for the haskellPackages set already takes 15s, I fear that's gonna be too slow to be usable in practice

mboes commented 5 years ago

Conceivably, Hydra could generate that for all of Nixpkgs though. Then we don't even need to checkin anything. @zimbatm isn't that what you have setup in some private Hydra instance?

thufschmitt commented 5 years ago

Unless I'm mistaken, the time taken by nix-instantiate is the time needed to evaluate the nix expression, which isn't something that hydra can cache

Profpatsch commented 5 years ago

If I understand correctly you suggest saving the .drvs for the whole of nixpkgs at once

Nope, that would stumble over packages marked as broken (run nix-instantiate '<nixpkgs>' to see an example), also it would take way to long as you said. I suggest users project the packages they need via a nix expression and then cache that.

zimbatm commented 5 years ago

@Profpatsch's approach seems the best as it's independent of any infrastructure requirements. For example if the user wants to provide it's own overrides it wouldn't be able to fetch things from the public Hydra anyways.

thufschmitt commented 5 years ago

I have a POC implementation in #50, feel free to take a look and criticize it