Use of `evalPackages` in lib/cabal-project-parser.nix problematic

nomeata commented 3 years ago

Not sure if there is a cure…

I am building a project that wants to use source-repository-package in its cabal.project, and I am setting sha256 as one should.

This cases lib/cabal-project-parser.nix to take this path:

      in (if sha256 != null
        then pkgs.evalPackages.fetchgit {
            url = repo.location;
            rev = repo.tag;
            inherit sha256;
          }

and this means that nix-instantiate’ing the OSX build of the project leads to different derivations on Darwin (the builder) and Linux (where I do the final deployment), which is a mild problem for me.

What would break if you use

        then pkgs.fetchgit {

instead?

michaelpj commented 3 years ago

In general we are hoist on the horns of a dilemma here:

If we use the build architecture for eval-time work, then instantiation will not redo work when initiated on different architectures, but you may be unable to instantiate the derivations for an architecture other than the current build architecture (e.g. you can't eval your Darwin packages on Linux). This is quite painful.
If we use the eval architecture for all eval-time work, then we risk redoing some of that eval-time work when it's initiated on different architectures, but at least you can evaluate everything.

In practice, not being able to evaluate some the full package set is really annoying, so that's why we do it this way.

Of course, the result should be fixed-output, so I hope you don't actually get different things on OSX?

nomeata commented 3 years ago

My assumption is that what would break is evaluating a derivation of a project build with a different system (say, osx)

Previously, it would evaluate (and if you have a remote builder configured, it would even work: you’d run fetchGit locally and push the source to the remote builder), but produce a different derivation than if run on the osx machine directly.

With this change, you can only evaluate this on a linux machine if you have a cache that has the output (or a remote builder). If you have a remote builder, it works (and the remote builder run fetchGit). And you get the same derivation as running it on osx directly.

nomeata commented 3 years ago

Of course, the result should be fixed-output, so I hope you don't actually get different things on OSX?

But I do! To confirm, check out this commit (https://github.com/entropia/tip-toi-reveng/commit/3533125d0fbda6fcbe74144971a911039bbf70d0), maybe add cachix add tttool, and compare

nix instantiate -A osx-exe-bundle

on linux and osx. You can also see the output of that command at https://github.com/entropia/tip-toi-reveng/actions/runs/353552210. Compare the linux and the osx job there, both have a step that runs nix-instaniate. Darwin reports /nix/store/akr7dsnjgl069c0xdqxd9y9lfdsglx2d-tttool-bundle.drv and linux reports /nix/store/xrgnv9n6hjz3pzpd4ls0rgv0z1b1hyrz-tttool-bundle.drv

After using pkgs.fetchGit, by patching haskell.nix in https://github.com/entropia/tip-toi-reveng/commit/e22c96411c7ea13546bdbe25a8da25e3a2810b30 (no other change, as you can see) I get /nix/store/akr7dsnjgl069c0xdqxd9y9lfdsglx2d-tttool-bundle.drv also on linux.

nomeata commented 3 years ago

This is the beginning of a nix-diff:

- /nix/store/2agsm8c1wn2c3czlyjqljl18as6kr8fr-tttool-bundle.drv:{out}
+ /nix/store/3zzz32bprw14c8sncxdkqyg5qxzbqmsy-tttool-bundle.drv:{out}
• The input named `tttool-osx-bundler.sh` differs
  - /nix/store/sd61ipxaw8d1jpzmd8b60227p79q7kcd-tttool-osx-bundler.sh.drv:{out}
  + /nix/store/h6x16ncy9097q2gixx24lviv0mnjc292-tttool-osx-bundler.sh.drv:{out}
  • The input named `tttool-exe-tttool-1.9` differs
    - /nix/store/vwr5yl492m5mys3yh3fakipidxvmv1ys-tttool-exe-tttool-1.9.drv:{out}
    + /nix/store/g9rwwkzl366x3nbawsz3a3mmqcg7y5hv-tttool-exe-tttool-1.9.drv:{out}
    • The input named `HPDF-lib-HPDF-1.4.10` differs
      - /nix/store/18nm1km5wmi2033g52v000v0npmysnkk-HPDF-lib-HPDF-1.4.10.drv:{out}
      + /nix/store/zqr44dr2g9k1kbjjxl7d7k8pa7vbw82q-HPDF-lib-HPDF-1.4.10.drv:{out}
      • The input named `HPDF-a43e6dd` differs
        - /nix/store/pxasppajly90fiizr1p9nl0pm1fl4anx-HPDF-a43e6dd.drv:{out}
        + /nix/store/k0aynjkhc60kgmi8ndkcqwlgg0ddwk0z-HPDF-a43e6dd.drv:{out}
        • The platforms do not match
            - x86_64-darwin
            + x86_64-linux

so the evaluation platform sneaks in somehow.

michaelpj commented 3 years ago

My assumption is that what would break is evaluating a derivation of a project build with a different system (say, osx)

Yes, exactly! But it's typical to have a project that has, say, a release.nix with jobs for both linux and osx. So what we would lose is the ability to instantiate release.nix on any system, which is very useful to check for evaluation bugs. Nix projects without IFD don't have this problem, of course.

But I do! To confirm, check out this commit...

:scream_cat:

Okay, well I guess the first thing is to try and figure out why that is...

nomeata commented 3 years ago

So what we would lose is the ability to instantiate release.nix on any system, which is very useful to check for evaluation bugs.

I agree, I guess I am trying to do precisely that (in the last step, nix-build -A release-zip)! But if the derivations are then different, doesn't this defeat the purpose a bit? Or are you saying it’s useful to evaluate that even if the resulting derivations differ?

michaelpj commented 3 years ago

I guess we believe that they're "only incidentally" different, although I don't think we have good evidence of that :grimacing:

It's very useful to be able to at least evaluate all the Nix code, even if the resulting derivations are slightly off. You want to check that all the stuff gated by isDarwin still works! Not to mention fixing it when it does and checking if it's fixed - you don't want to have to push and wait for CI every time. It doesn't break that often, but often enough that it's quite annoying not even being able to evaluate it on your machine is a huge pain.

(The same argument applies to stuff building on Darwin, but I guess I don't even try in that instance and just ask a colleague with an OSX machine :laughing: )

nomeata commented 3 years ago

Incidentially different is still enough to not be able to pull the artifact built by darwin from the cache on a linux machine… (Maybe too obscure a use case).

I wonder if there is a way to have a fixed-path derivation that isolates downstream derivations from the details of upstream derivations… Then you could evaluate it on any machine, and the fixed output means it doesn't matter which machien it is on.

michaelpj commented 3 years ago

I actually thought that was true of fixed-output derivations already. But I guess we're doing at least some non-fixed-output eval-time work, at which point maybe the system creeps in.

This is maybe another argument for pinning plan-sha256: then the final eval-time product really is a fixed-output derivation.

nomeata commented 3 years ago

But I am pinning all plans! (more :scream_cat:)…

michaelpj commented 3 years ago

I would have thought that if you were pinning the plan, then you'd get an immediate cache hit. You said you were getting a cache hit anyway, could it be that? Or maybe you've got checkMaterialization = true?

So much black magic :sweat_smile:

nomeata commented 3 years ago

I am running with checkMaterialization = true in the darwin run, but without checkMaterialization = true in the release run. Does checkMaterialization = true affect the resulting derivation? (Doesn't seem like it, though)

Oh, I just noticed that I can actually check the materialization of my osx package plan even on linux (I guess because this doesn’t actually need a real ghc, merely cabal, presumably taken from evalPackages). Neat!

nomeata commented 3 years ago

Too bad that

      in (if sha256 != null
        then builtins.path {
      name = "foo";
      path =
            builtins.fetchGit {
              url = repo.location;
              ref = repo.tag;
             };
          sha256 = sha256;
        }

doesn’t work to isolate the use of this path from how it is created…

Once https://github.com/NixOS/nix/commit/f74243846512ffabf082985bca395890c97643e0 reaches nix it seems that we could write

      in (if sha256 != null
        then builtins.fetchGit {
          url = repo.location;
          ref = repo.tag;
          sha256 = sha256;
        }

and all would be well.

michaelpj commented 3 years ago

checkMaterialization = true makes it redo the generation of the Nix files, so it can check if the sha is correct; i.e. it's not a fixed-output derivation in that case.

nomeata commented 3 years ago

I have a theory: The result of pkgs.evalPackages.fetchgit is used twice:

to calculate the plan
to actually build the library

For 1., I agree that pkgs.evalPackages.fetchgit is desirable: It means that every system can do the evaluation, and because the output is just to calculate the plan, which is itself used by nix during evaluation, it does not affect the final derivatoin.

But for 2., it does affect the final derivation, so there pkgs.fetchgit should be used.

It reminds me of when I was materializing the output of cabal2nix (i.e. convential nixpkgs haskell packaging) while using something like gitSource.nix to get the source: I had to make sure the materialized .nix file would actually use the hosts’s git…

Now I tried to see if I could implement that and wanted to inspect the materialized file, but it doesn't exist:

~/projekte/tiptoi/tip-toi-reveng $ nix-instantiate  -A linux-exe --arg checkMaterialization true
trace: Using index-state: 2020-11-08T00:00:00Z
trace: To materialize the output entirely, pass a writable path as the `materialized` argument and pass that path to /nix/store/5x696pqwahvn7m6pjgca936gd3sbva9n-generateMaterialized
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/nm7npn8z7mi5dmfghms1sqi3vnzr14p6-tttool-exe-tttool-1.9.drv
~/projekte/tiptoi/tip-toi-reveng $ LANG=C ls /nix/store/5x696pqwahvn7m6pjgca936gd3sbva9n-generateMaterialized
ls: cannot access '/nix/store/5x696pqwahvn7m6pjgca936gd3sbva9n-generateMaterialized': No such file or directory

(I guess this is #456)

nomeata commented 3 years ago

This is what I mean, cast into a PR: https://github.com/input-output-hk/haskell.nix/pull/918 (minimal changes to explain what I want to achieve here; if desirable may want some cleaniup. Also, can optimize in the case where evalPackages = packages to avoid duplicate work)

michaelpj commented 3 years ago

Hah. I think doing this more thoroughly is what I wanted to do in https://github.com/input-output-hk/haskell.nix/issues/814. I'm a little bit afraid of making the whole eval/build packages thing more complicated than it already is, but I guess it's hard to simplify without some quite invasive refactoring.

nomeata commented 3 years ago

Well, if you are unsure if #814 is just pedantery, or actually solves user’s problems, take this as indication of the latter ;-)

nomeata commented 3 years ago

Darn, either I broke my fix in the latest refactoring, or something else is amiss… will dig into that maybe tomorow or the weekend

michaelpj commented 3 years ago

That's annoying. I wonder if we can do something clever with disallowedReferences to ensure we don't get this sort of problem again?

nomeata commented 3 years ago

False alarm: The problem I was obsreving yesterday only occurs on PRs from other repos, which don’t have the permissions to upload to the nix cache, and thus break this idea of job transfering build artifacts between jobs via the cachix cache.

But all good here.

input-output-hk / haskell.nix

Use of `evalPackages` in lib/cabal-project-parser.nix problematic #917