haskell-nix / hnix-store

Haskell implementation of the Nix store
Apache License 2.0
85 stars 23 forks source link

nix-store path calculation #217

Open ghostbuster91 opened 10 months ago

ghostbuster91 commented 10 months ago

Hi,

First, a little bit of context. I am trying to programatically generate nix derivation using scala. It turned out that I need to calculate nix-store path in order to put the drv file into the nix-store (which is a requirement for realizing it).

Because of that I started to implement minimialistic version of hnix-store in scala, so that I can calculate the nix-store output path. I tried reading The Purely Functional Software Deployment Model and the code in this repository (though I don't know much haskell), however it was enough for me to get started. Then I also found https://web.archive.org/web/20221001050043/https://comono.id/posts/2020-03-20-how-nix-instantiation-works/ which was an invaluable help.

I am at the point where I can calculate nix-store path correctly for some real derivations like "/nix/store/dsn6vl7x1hbn1akgpxync19gpx2dzy8w-bootstrap-tools" or more complex /nix/store/32lr8w57frc1ij5wzc3hb9ks8vzs2ms1-libffi-3.4.4.drv.

However, for some reason I cannot calculate correctly nix-store path for "/nix/store/h8z4rypl78kwais0yim76czxjnd55dsm-python3-minimal-3.10.12"

There must be something different about this package/its inputDrvs but I fail to spot anything.

I wonder if you know any better resources about the algorithm used to calculate nix-store paths. I will list steps that I do, in the hope that maybe someone will be able to spot a mistake: (fixed hash derivation are left out for brevity)

  1. replace each inputDrv with its descriptor hash and sort that list lexicographical
  2. mask outputs (all outputs are set to "", environment variables that refer to outputs are also set to "")
  3. calculate hash of such modified serialized derivation
  4. concatenate the hash with metadata and hash it again: sha256("output:out:sha256:${sha256(d)}:/nix/store:${d.env("name")}")
  5. truncate to 160 bits and convert to base32 nix-variant

descriptor hashes are calculated as follows:

  1. if this is a fixed hash derivation then calculate sha256("fixed:out:${d.hashAlgo}:${d.hash}:${d.path.get}")
  2. otherwise replace each inputDrv with its descriptor hash and sort that list lexicographical
  3. calculate hash of such modified serialized derivation - sha256(derivation)

If anything I think that I might be handling multiple outputs incorrectly. If there is a derivation A that defines several outputs:

   "outputs": {
      "dev": {
        "path": "/nix/store/8qg5ralh4c1m2pas6lbi572qykxxsxdn-libffi-3.4.4-dev"
      },
      "info": {
        "path": "/nix/store/v5j3cysbnah4m265wlm57gjmln18qq7a-libffi-3.4.4-info"
      },
      "man": {
        "path": "/nix/store/x92j3f8v85h216avky5rdi5xizx12j6h-libffi-3.4.4-man"
      },
      "out": {
        "path": "/nix/store/ksz7in14b8si5f107w3ay3ph79f67i68-libffi-3.4.4"
      }
    },

and then we depend on such derivation in B:

inputDrvs=[
    "/nix/store/32lr8w57frc1ij5wzc3hb9ks8vzs2ms1-libffi-3.4.4.drv" -> ("dev", "out"),
...
]

I don't change my logic for calculating B descriptor in terms of A and I do it the same way as if there was only a single output out both defined in B and used in A.

Thanks in advance :bowing_man:

sorki commented 10 months ago

I don't see anything obvious immediately but it's been a while since I read the paper. You might want to check https://github.com/haskell-nix/hnix/blob/master/src/Nix/Effects/Derivation.hs as well

sorki commented 10 months ago

Hi! was in a hurry, so sorry for terse response.

Have you managed to figure it out? It's pretty cool that you can use our code as a reference even as a non-Haskeller!

Btw, what's your motivation for implementing this?

ghostbuster91 commented 10 months ago

Hi, no worries :)

Have you managed to figure it out?

Unfortunately, I wasn't able to make any progress. I bet that this is quite a subtle difference which makes it even harder to spot in the code.

It's pretty cool that you can use our code as a reference even as a non-Haskeller!

well, to some degree modulo my haskell skills :laughing:

I plan to ran your code against the same package and check both final and intermediate results, but not knowing haskell doesn't help my motivation these days.

Btw, what's your motivation for implementing this?

So, I want to create a build tool for scala that will piggy back on nix as much as possible. One of my requirements is to use zinc which is an incremental compiler for scala. Then, based on the user input I will create on-the-fly a nix derivation that will feed the output of inc. compiler's previous run into the current build process. So basically I will curry the zinc state across multiple build run invocations. To do this I need to be able to create nix-derivation with provided inputs as I don't won't to construct nix expressions pragmatically. Also the build configuration won't be written in nix language but in something else.

(compilerOutput, zincState1) = compileScala(sources, NoState)
(compilerOutput, zincState2) = compileScala(sources, zincState1)

I hope it makes sense :sweat_smile:

Ericson2314 commented 10 months ago

CC @flokli

Ericson2314 commented 10 months ago

@ghostbuster91 Two other things:

  1. It is not unreasonable to create a mode for nix derivation add where you don't need to pre-calculate the path.
  2. If you are wiling to use content-addressing derivations (i.e. don't need Hydra) you can already skip this; those don't have precomputed output paths because the output paths are unknown until they are built.
ghostbuster91 commented 10 months ago

Re. content-addressing - yeah I heard about it and it seems that it should work. Not sure yet how much lack of hydra will be of a problem. I will need to check this.

However, since I got already quite far I wanted to finish implementing that approach. I didn't find many resources on that topic hence I figured out that I will write a blogpost documenting how this process works under the hood. So this kind of become a goal on its own :)

Re. nix derivation add - sorry I didn't get this, could you elaborate?

Ericson2314 commented 10 months ago

@ghostbuster91 nix derivation add is a new command that is basically nix derivation show in reverse. It uses JSON for convenience; it should probably compute store paths too for convenience.

flokli commented 10 months ago

As I got cc'ed - In case reading another implementation might help - during the development of Tvix we reverse-engineered the output path calculation and produced some general-purpose (rust) code in nix-compat that does the output path calculation - mostly the calculate_output_paths and derivation_or_fod_hash functions.

Consumers of this code are a bunch of testcases, as well as builtins.derivationStrict.

Maybe some of that code helps you to understand where things happen differently?

ghostbuster91 commented 10 months ago

@Ericson2314 thanks, I didn't know about this. I will check it out :+1:

@flokli

In case reading another implementation might help

It definitively will. I grew up on imperative code so reading rust should be easier. Thanks for the links :bowing_man: