Open 4e6 opened 6 years ago
related #40
Currently, I see no ways of sandboxing the stackage2nix wrapper. See the issues below.
stackage2nix wrapper requires following dependencies to be fetched, see nix/lib.nix
To be able to satisfy the sandbox requirements, all these dependencies should be prefetched before the build by the standard nix-prefetch-scripts
.
Only files are needed, so fetchgit
can be used to fetch fpco/lts-haskell
and fpco/stackage-nightly
dependencies. The only issue with this approach is less convenient updates, because it would require updating revision and hash for both repos, instead of bumping single cacheVersion
parameter.
To build the exact copy of stackage packages set, stackage2nix
searches for a project definitions in all-cabal-hashes
by a hash defined in stackage config (single version of the package may have different revisions). In order to do so, all-cabal-hashes
should be fetched with git metadata. Due to NixOs/nixpkgs #8567 there is no reliable way to do this with fetchgit
.
The solution might be to fetch zip archive of a particular version. AFAIK, Github is able to create such links but only for archives containing project files, without metadata.
An issue with hackage-db is that URL doesn't have a particular version to put in fetchurl
script. I'm assuming that hackage-db could be recreated from all-cabal-hashes
repo, but I'm not sure how. Other solution would be to fetch versioned db from some other place.
all-cabal-hashes
stuff in callHackage
: https://github.com/NixOS/nixpkgs/blob/43a62b66d0175b10fd3cc6f1fabdec9d205c171c/pkgs/development/haskell-modules/make-package-set.nix#L126
Regarding the non-determinism of all-cabal-hashes
.
I've found this old comment on the original issue thread. The idea is to unpack the git objects and store them uncompressed https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa
We should be able to do this unpacking as a postUnpack
build step.
Downsides:
Upsides:
fetchgit
libgit
interface (no changes are needed for stackage2nix
itself)Do you really care about the git history or is it because the tool wants to query the current reference of the checkout?
For the latter, it could make sense to re-build a fake .git
database with only the following files:
.git/HEAD -> ref: refs/heads/master
.git/refs/heads/master -> e843a2271a972b8cb6401e67f25d22c8f6fa68cb
@zimbatm It's the mapping from sha1 to a file content that is needed.
so the tool is not looking at the checked-out content but querying the git database directly instead?
if you go down the fetchgit + unpacked blobs maybe you can make it smaller by using a shallow copy of the database.
given the level of effort involved it could make sense to patch upstream as well
@zimbatm The full history is still needed, as we need all blobs reachable from the required commit.
I've discussed this with @4e6, and I think I'll just make a small tool that will create a canonical representation of git .pack file. So if everything (branches, tags) is properly pruned before that, the result will be a working git checkout that is also reproducible. I'll experiment with this approach here. If it'll work out, I try to do the same in the fetchgit
itself.
I tried the approach referenced in my previous comment with the unpacking of git objects https://github.com/bendlas/nixpkgs/commit/4b9c24a5d33407f88457d7e125ca78cbefa30afa
This led to the increase of all-cabal-hashes
checkout size from 1.6 Gb to 16 Gb, which is not acceptable.
Maybe we can use the github zip archive? It should allow fast random reads.
Maybe we can use the github zip archive? It should allow fast random reads.
Filenames are used only as a fallback, primary addressing method is by GitSHA1. So a full .git-repo is needed.
As I understand it, the bare git repo is only used because it is more compact than doing a repo checkout. However, there is no good way to get an up-to-date one within a nix sandbox. I had to revert 86f11b89 while working on updating nixpkgs-stackage.
Getting the latest .zip is trivial (builtins.fetchurl
), way faster (20s vs 1m20s for git clone) and way smaller (189MB vs 366MB). Zip allows random access for decompression, so should be fast to grab files out of.
@yorickvP To make a latest .zip usable, you need to calculate GitSHA1 of every file inside of it and cache this info somewhere. It's doable, except for hackage revisions (just grep by x-revision
) - there'll be only the latest revision available, without any way to fetch older ones. And that is what being solved by having a .git-folder.
Proper solution is to create some canonical representation of a .git repo which will be reproducible. Maybe that will require writing a custom git .pack file generator.
Does stack even expose the used cabal file revision? The intractability of the problem does not seem worth any of the potential savings of using older cabal files sometimes, assuming cabal files are rarely updated and do not break anything.
@yorickvP Yes, it's exposed - e.g. search for GitSHA1
in https://raw.githubusercontent.com/commercialhaskell/lts-haskell/master/lts-12.16.yaml
If non-breaking updates are OK, why you've enabled the sandboxing? =)
Unable to build
nix/stackage2nix
on NixOS with nix.useSandbox enabled.