input-output-hk / haskell.nix

Alternative Haskell Infrastructure for Nixpkgs
https://input-output-hk.github.io/haskell.nix
Apache License 2.0
558 stars 240 forks source link

Deduplicating same package #1567

Open damhiya opened 2 years ago

damhiya commented 2 years ago

It seems haskell.nix rebuilds same package when I modified unrelated options.

For example, I have duplicated hackage state files at my /nix/store

$ ls /nix/store | grep hackage.haskell.org-at-2022-07
1fmkbsrmw697lq4h0qjicg02mm5q7ccc-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
2fq4w3nsgq6xicsppk2bpbqjca3663yy-hackage-repo-hackage.haskell.org-at-2022-07-23T000000Z.drv
40yzxaayk2wmzkijs51dwl21kwzzk122-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
5qpmrnqyzk3m0xn36z5lxnlxdaj5xprz-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
6c11n2yaj2hb6k4wl4np4ny777zv22rj-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
8c4v6y4576rlxn8vj85h2hzhd3drnra8-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
976jd09v4fpc8913xry34sjwkmiirv54-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z
a0zx960mggrlfgcpdxxqix7z5h42h8c0-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z
bb6j0vqlm138qpi32jf1xf5is8mi9k8k-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z
dz08xigsq1gj5x7fg06i81lba5rih2pq-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
gdgq8p58wz4w89dc8hb3jl6s7lps40nc-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
gm97yzb2wcsxls8b2r8q4pwdzhdvcybj-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z.drv
jnznkd0dbymhwclh6i0kwgdk4qmd2jdh-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z
kzqp87xab8084c41ay30xj3wwm317cr1-hackage-repo-hackage.haskell.org-at-2022-07-23T000000Z.drv
lp8spkzrr3x2yfn31lspsciz4gz30hia-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z
pgan07py6hlf44n3ixzvk9v1zj25xlgm-hackage-repo-hackage.haskell.org-at-2022-07-23T000000Z
x25ay8700v5b4x2sm10zzxmbxadmb6sj-dot-cabal-hackage.haskell.org-at-2022-07-23T000000Z

I don't get why state files of the same date may have different hash value. These files are pretty big, so downloading it serveral times is really annoying.

nix-tools is another example.

$ ls /nix/store | grep nix-tools$
3p1vj4h07hvqy4dwn16hnibjc9izsajg-nix-tools
77lmq9rbxax04b4szi56q8f0qcw1qndf-nix-tools
bi638hbjfzavnwjhiys04jw69hb88ag7-nix-tools
hzbyr7qfkfx3jkq1k10q8ik1fxzna3bw-source-root-lib-nix-tools
z0qhrxrbcjr2c9vid7njcsr6yapy5n01-nix-tools
zvq28c6xmml3rdjizfl0pzli1girvqan-nix-tools

I observed nix rebuilding nix-tools when I copy and paste exisiting haskell project setting and modified some options (project name, haskell library list, ...), but did not changed compiler and haskell.nix versions.

It is possible to make nix to not duplicate such packages?

hamishmack commented 2 years ago

I used nix-diff to compare two of these and the difference between them is the GHC version used to build cabal-install and nix-tools (haskell.nix defaults to using the project's GHC version to build these tools).

One solution might be to have a fixed GHC version and use only that for building these. That has two drawbacks:

Every dot-cabal-hackage.haskell.org is 800MB.

In the short term nix store optimise from time to time might help free up some disk space. It won't help the download time and won't help disk space on filesystems with automatic deduplication.

damhiya commented 2 years ago

I used nix-diff to compare two of these and the difference between them is the GHC version used to build cabal-install and nix-tools (haskell.nix defaults to using the project's GHC version to build these tools).

One solution might be to have a fixed GHC version and use only that for building these. That has two drawbacks:

  • If the user is not using a nix binary cache and the GHC version they want is not the one we choose they will have to wait for it to build.
  • On macOS cabal-install and nix-tools include dependencies on the GHC derivation they are built with.

Every dot-cabal-hackage.haskell.org is 800MB.

In the short term nix store optimise from time to time might help free up some disk space. It won't help the download time and won't help disk space on filesystems with automatic deduplication.

I thought downloading specific index file is just sufficient for building dot-cabal-hackage.haskell.org, since hackage.nix repo manages fixed hackage index of every dates. Why do we need cabal-install or nix-tools to build dot-cabal-hackage.haskell.org?

hamishmack commented 2 years ago

nix-tools to truncate the index https://github.com/input-output-hk/haskell.nix/blob/62a4287cac70bd06827844f9470da4eab01719f1/overlays/haskell.nix#L198

cabal-install to build a suitable .cabal directory using cabal new-update https://github.com/input-output-hk/haskell.nix/blob/62a4287cac70bd06827844f9470da4eab01719f1/overlays/haskell.nix#L283

We could delay these steps and run them on demand. If we changed the derivations into a buildDotCabalHome script then we could have the derivations that need the cabal home dir run something like:

HOME=$(mktemp -d)
${buildDotCabalHome ...}
cabal configure ...

That might just trade the storage issue for a performance one though.

damhiya commented 2 years ago

What if we store index files as a pure file somewhere (e.g. hackage.nix) and make a nix derivation referring that files? I think there should be no duplication due to different tool chain versions in that case. Since hydra.iohk.io already caches hackage index derivations, this seems as a reasonable approach to me. Also, If we split the hackage index into fragments and reassemble them when we build a hackage index derivation, there might be further saving.