Build stackage2nix in NixOS sandbox

4e6 commented 6 years ago

Unable to build nix/stackage2nix on NixOS with nix.useSandbox enabled.

nix.useSandbox If set, Nix will perform builds in a sandboxed environment that it will set up automatically for each build. This prevents impurities in builds by disallowing access to dependencies outside of the Nix store. This isn't enabled by default for performance. It doesn't affect derivation hashes, so changing this option will not trigger a rebuild of packages.

4e6 commented 6 years ago

related #40

4e6 commented 6 years ago

Currently, I see no ways of sandboxing the stackage2nix wrapper. See the issues below.

stackage2nix wrapper requires following dependencies to be fetched, see nix/lib.nix

[x] fpco/lts-haskell
[x] fpco/stackage-nightly
[ ] commercialhaskell/all-cabal-hashes
[ ] hackage-db at https://hackage.haskell.org/01-index.tar.gz

To be able to satisfy the sandbox requirements, all these dependencies should be prefetched before the build by the standard nix-prefetch-scripts.

Stackage config files

Only files are needed, so fetchgit can be used to fetch fpco/lts-haskell and fpco/stackage-nightly dependencies. The only issue with this approach is less convenient updates, because it would require updating revision and hash for both repos, instead of bumping single cacheVersion parameter.

all-cabal-hashes

To build the exact copy of stackage packages set, stackage2nix searches for a project definitions in all-cabal-hashes by a hash defined in stackage config (single version of the package may have different revisions). In order to do so, all-cabal-hashes should be fetched with git metadata. Due to NixOs/nixpkgs #8567 there is no reliable way to do this with fetchgit. The solution might be to fetch zip archive of a particular version. AFAIK, Github is able to create such links but only for archives containing project files, without metadata.

hackage-db

An issue with hackage-db is that URL doesn't have a particular version to put in fetchurl script. I'm assuming that hackage-db could be recreated from all-cabal-hashes repo, but I'm not sure how. Other solution would be to fetch versioned db from some other place.

kirelagin commented 6 years ago

all-cabal-hashes stuff in callHackage: https://github.com/NixOS/nixpkgs/blob/43a62b66d0175b10fd3cc6f1fabdec9d205c171c/pkgs/development/haskell-modules/make-package-set.nix#L126

4e6 commented 6 years ago

Regarding the non-determinism of all-cabal-hashes.

I've found this old comment on the original issue thread. The idea is to unpack the git objects and store them uncompressed https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa We should be able to do this unpacking as a postUnpack build step.

Downsides:

will lead to increased size of git repository

Upsides:

deterministic fetchgit
(should be checked) we can access those objects through the libgit interface (no changes are needed for stackage2nix itself)

zimbatm commented 6 years ago

Do you really care about the git history or is it because the tool wants to query the current reference of the checkout?

For the latter, it could make sense to re-build a fake .git database with only the following files:

.git/HEAD -> ref: refs/heads/master
.git/refs/heads/master -> e843a2271a972b8cb6401e67f25d22c8f6fa68cb

binarin commented 6 years ago

@zimbatm It's the mapping from sha1 to a file content that is needed.

zimbatm commented 6 years ago

so the tool is not looking at the checked-out content but querying the git database directly instead?

if you go down the fetchgit + unpacked blobs maybe you can make it smaller by using a shallow copy of the database.

given the level of effort involved it could make sense to patch upstream as well

binarin commented 6 years ago

@zimbatm The full history is still needed, as we need all blobs reachable from the required commit.

I've discussed this with @4e6, and I think I'll just make a small tool that will create a canonical representation of git .pack file. So if everything (branches, tags) is properly pruned before that, the result will be a working git checkout that is also reproducible. I'll experiment with this approach here. If it'll work out, I try to do the same in the fetchgit itself.

4e6 commented 6 years ago

I tried the approach referenced in my previous comment with the unpacking of git objects https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa

This led to the increase of all-cabal-hashes checkout size from 1.6 Gb to 16 Gb, which is not acceptable.

yorickvP commented 5 years ago

Maybe we can use the github zip archive? It should allow fast random reads.

binarin commented 5 years ago

Maybe we can use the github zip archive? It should allow fast random reads.

Filenames are used only as a fallback, primary addressing method is by GitSHA1. So a full .git-repo is needed.

yorickvP commented 5 years ago

As I understand it, the bare git repo is only used because it is more compact than doing a repo checkout. However, there is no good way to get an up-to-date one within a nix sandbox. I had to revert 86f11b89 while working on updating nixpkgs-stackage. Getting the latest .zip is trivial (builtins.fetchurl), way faster (20s vs 1m20s for git clone) and way smaller (189MB vs 366MB). Zip allows random access for decompression, so should be fast to grab files out of.

binarin commented 5 years ago

@yorickvP To make a latest .zip usable, you need to calculate GitSHA1 of every file inside of it and cache this info somewhere. It's doable, except for hackage revisions (just grep by x-revision) - there'll be only the latest revision available, without any way to fetch older ones. And that is what being solved by having a .git-folder.

Proper solution is to create some canonical representation of a .git repo which will be reproducible. Maybe that will require writing a custom git .pack file generator.

yorickvP commented 5 years ago

Does stack even expose the used cabal file revision? The intractability of the problem does not seem worth any of the potential savings of using older cabal files sometimes, assuming cabal files are rarely updated and do not break anything.

binarin commented 5 years ago

@yorickvP Yes, it's exposed - e.g. search for GitSHA1 in https://raw.githubusercontent.com/commercialhaskell/lts-haskell/master/lts-12.16.yaml

If non-breaking updates are OK, why you've enabled the sandboxing? =)

typeable / stackage2nix