input-output-hk / devx

The Developer Experience Shell - This repo contains a nix develop shell for haskell. Its primary purpose is to help get a development shell for haskell quickly and across multiple operating systems (and architectures).
Apache License 2.0
37 stars 9 forks source link

[DevX bootstrap] Benchmark the speed download of the `devx` closure #22

Closed yvan-sraka closed 10 months ago

yvan-sraka commented 1 year ago

When using GHA to turn iohk/devx into a shell to run, e.g. cabal update, cabal build, we take a lot of time downloading stuff.

A lot of this time comes down to nix sequentially downloading a lot of data. We should build the store and export/import it instead to speed this up.

nix path-info --closure-size --human-readable $(nix print-dev-env --json .#ghc8107-static-minimal | jq -r .variables.out.value)

… will give us something like 2.5G. That's a lot.

We can also enter the shell using

$ nix print-dev-env .#ghc8107-static-minimal > env.sh
$ bash --rcfile env.sh

(e.g. instead of nix develop).

We could pre-build the closure (e.g. from result), and store that as a zstd compressed archive:

nix-store --export $(nix-store -qR result) | zstd -z8T8 > out.zstd

And then re-import this as the first step in GHAs after setting up nix.

See for example this GHA: https://github.com/angerman/x/blob/c559ae0429bb69829a9c9cae8c21ab777461aaf2/.github/workflows/main.yml#L23-L66, which doesn't work properly yet (nix still ends up downloading stuff when trying to enter the shell; maybe this can be eliminated with the env.sh idea from above).

yvan-sraka commented 1 year ago

I wanted to measure the time latency improvement of such a hack, so I wrote a dumb python script:

import os
import timeit

DEV_SHELLS = [
    "ghc8107",
    "ghc902",
    "ghc925",
    "ghc8107-minimal",
    "ghc902-minimal",
    "ghc925-minimal",
    "ghc8107-static-minimal",
    "ghc902-static-minimal",
    "ghc925-static-minimal",
]

T = {}
flake = "input-output-hk/devx" # vs. yvan-sraka/static-closure
for devShell in DEV_SHELLS:
    os.system(f"nix-collect-garbage -d")
    x = lambda number: round(timeit.timeit(lambda: os.system(
        f'nix develop "github:{flake}#{devShell}"\
        --no-write-lock-file --refresh --command true'
    ), number=number), 2)
    T[devShell] = {"bootstrap": x(1), "reload": x(10)}
print(T)

I currently exhaustively list working version of the input-output-hk/devx devShell, see issues #23 and #24. My machine (an iMac 24" M1) uses zw3rk.com cache (but I disable remote builder) as I wanted to match what I imagine the “defaults” user setting.

n.b. I blindly choose Python because there is maybe a future where I want to possibly perform some basics statistics with numpy or display graphics with matplotlib!

The hack lives in this flake.nix shellHook.

I'll post the benchmark result I got in this thread :)

angerman commented 1 year ago

This is good looking forward to the benchmarks!

yvan-sraka commented 1 year ago

The awaited benchmarks that ran past night on my machine (values unit is seconds):

Without the speed download hack (meaning the actual nix develop github:input-output-hk/devx#$key flake):

ghc8107:
  bootstrap: 2155.54
  reload: 4.57
ghc902: broken
ghc925:
  bootstrap: 2005.21
  reload: 4.43
ghc8107-minimal:
  bootstrap: 1820.64
  reload: 3.59
ghc902-minimal:
  bootstrap: 1788.87
  reload: 3.33
ghc925-minimal:
  bootstrap: 1826.22
  reload: 3.76
ghc8107-static-minimal:
  bootstrap: 1774.84
  reload: 3.46
ghc902-static-minimal:
  bootstrap: 1780.89
  reload: 4.75
ghc925-static-minimal:
  bootstrap: 1871.04
  reload: 3.4

With the speed download hack of version 70e3884 (that could be summed up as: curl https://s3.zw3rk.com/devx/$arch.$key.zstd | zstd -d | nix-store --import and then the env trick. It does cache the download, but unconditionally does the nix-store import for each reload…):

ghc8107: broken
ghc902: broken
ghc925: broken
ghc8107-minimal:
  bootstrap: 95.71
  reload: 37.74
ghc902-minimal:
  bootstrap: 100.58
  reload: 37.6
ghc925-minimal:
  bootstrap: 101.05
  reload: 33.25
ghc8107-static-minimal:
  bootstrap: 82.32
  reload: 30.61
ghc902-static-minimal:
  bootstrap: 91.1
  reload: 31.1
ghc925-static-minimal:
  bootstrap: 88.54
  reload: 27.21

First, there are few settings that are “broken” and I should investigate why … Then, as you can see, it's a big improvement in bootstrap speed (I have a quite slow internet connection so that surely helps to increase the numbers) …

… but there is more work to do, as @angerman made me realize: wrapper derivation should not have to rely on minio-client! And reload time here is bad (the measure is 10x re-entering the shell): I should fix that, so it behaves at least like the “without the speed download hacknix develop and even I believe I can potentially shave those numbers a bit. :)

angerman commented 1 year ago

@yvan-sraka can you please update the comment above with the following remarks:

If we had a canary derivation (e.g. the root of the imported closure), we could validate the existence of the closure in the store, by checking for the existence of that file; and skip the import?

yvan-sraka commented 1 year ago

@yvan-sraka can you please update the comment above with the following remarks:

Edited :)

If we had a canary derivation (e.g. the root of the imported closure), we could validate the existence of the closure in the store, by checking for the existence of that file; and skip the import?

Yes! That's precisely what I've in mind and implemented in a new flake version that should also have fixed the broken builds. I should indeed re-run benchmark against this new flake version, which is currently --impure.

yvan-sraka commented 1 year ago

On aarch64-darwin, the flavors ghc8107 and ghc902 are failing because of:

@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking sources
unpacking source archive /nix/store/9pqv84n4fxaadafjx32wi4c7d044xb0z-hlint-3.5-src
source root is hlint-3.5-src
@nix { "action": "setPhase", "phase": "patchPhase" }
patching sources
@nix { "action": "setPhase", "phase": "updateAutotoolsGnuConfigScriptsPhase" }
updateAutotoolsGnuConfigScriptsPhase
@nix { "action": "setPhase", "phase": "configurePhase" }
configuring
Configure flags:
--prefix=/nix/store/rvb3z44kwnwni719lndy9qz2dp84qxmw-hlint-exe-hlint-3.5 exe:hlint --package-db=clear --package-db=/nix/store/bvjs7g3g5i10h7pl360kpsbbh39m9y2s-hlint-exe-hlint-3.5-config/lib/ghc-9.0.2/package.conf.d --flags=ghc-lib --flags=gpl --flags=-hsyaml --flags=threaded --exact-configuration --dependency=hlint=hlint-3.5-DVvFAeGfGhl4cGfHK851Zv --dependency=array=array-0.5.4.0 --dependency=base=base-4.15.1.0 --dependency=deepseq=deepseq-1.4.5.0 --dependency=ghc-bignum=ghc-bignum-1.1 --dependency=ghc-boot-th=ghc-boot-th-9.0.2 --dependency=ghc-prim=ghc-prim-0.7.0 --dependency=integer-gmp=integer-gmp-1.1 --dependency=pretty=pretty-1.1.3.6 --dependency=rts=rts --dependency=template-haskell=template-haskell-2.17.0.0 --with-ghc=ghc --with-ghc-pkg=ghc-pkg --with-hsc2hs=hsc2hs --with-gcc=cc --with-ld=ld --with-ar=ar --with-strip=strip --disable-executable-stripping --disable-library-stripping --disable-library-profiling --disable-profiling --enable-static --enable-shared --disable-coverage --enable-library-for-ghci --datadir=/nix/store/44xd104h90xxkjbvg6sdriq471mpzir2-hlint-exe-hlint-3.5-data/share/ghc-9.0.2 --ghc-option=-fPIC --gcc-option=-fPIC 
Configuring executable 'hlint' for hlint-3.5..
@nix { "action": "setPhase", "phase": "buildPhase" }
building
Preprocessing executable 'hlint' for hlint-3.5..
Building executable 'hlint' for hlint-3.5..
[1 of 1] Compiling Main             ( src/Main.hs, dist/build/hlint/hlint-tmp/Main.o )
'apple-a12' is not a recognized processor for this target (ignoring processor)
'apple-a12' is not a recognized processor for this target (ignoring processor)
'apple-a12' is not a recognized processor for this target (ignoring processor)
'apple-a12' is not a recognized processor for this target (ignoring processor)
'apple-a12' is not a recognized processor for this target (ignoring processor)
'apple-a12' is not a recognized processor for this target (ignoring processor)
Linking dist/build/hlint/hlint ...
/nix/store/48py6zrawzim9ghrnkqwm36jl4j1l23x-clang-wrapper-11.1.0/bin/ld: line 256: 26817 Segmentation fault: 11  /nix/store/5wvlj00dr22ivh210b18ccv1i60h6c1q-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`clang' failed in phase `Linker'. (Exit code: 139)
yvan-sraka commented 1 year ago

I will re-run benchmarks in a GitHub Action context to have some consistency, since my personal internet connection is right now not enough reliable to not false results …

angerman commented 1 year ago

I don't think speed is the primary issue, as long as it's consistent. E.g. if you always get the same speed reliably that's going to provide good numbers. And most users won't be having 1G or 10G lines, but some XXX Mbit most likely.

If we find out that for fast lines, it's even worse though, that would also be good to know.

yvan-sraka commented 1 year ago

I don't think speed is the primary issue, as long as it's consistent. E.g. if you always get the same speed reliably that's going to provide good numbers. And most users won't be having 1G or 10G lines, but some XXX Mbit most likely.

If we find out that for fast lines, it's even worse though, that would also be good to know.

Yes! My current connection issues are effectively more about consistency than speed :)

yvan-sraka commented 11 months ago

I've run new benchmarks on @hamishmack fetch-docker.sh in order to integrate them to the engineering blog post with this new script:

#! /usr/bin/env python
import os

def Dockerfile(shell, fast):
    if fast:
        cmd = f'./fetch-docker.sh input-output-hk/devx x86_64-linux.{shell}-env | zstd -d | nix-store --import'
    else:
        cmd = f'nix develop "github:input-output-hk/devx#{shell}" --command true'
    return f"""FROM nixos/nix
RUN nix-channel --update
RUN echo "experimental-features = nix-command flakes" >> /etc/nix/nix.conf
RUN echo "accept-flake-config = true" >> /etc/nix/nix.conf
RUN ln -s $(which bash) /bin/bash
RUN nix profile install "nixpkgs#jq" "nixpkgs#zstd"
RUN curl -L https://raw.githubusercontent.com/input-output-hk/actions/latest/devx/support/fetch-docker.sh -o fetch-docker.sh
RUN chmod +x fetch-docker.sh
RUN time {cmd}"""

for shell in ["ghc8107-iog", "ghc962-iog", "ghc8107-static-minimal", "ghc962-static-minimal"]:
    for fast in [True, False]:
        with open('Dockerfile', 'w') as file:
            file.write(Dockerfile(shell, fast))
        os.system(f'docker build . | tee {shell}-{"fast" if fast else "slow"}.log')

… and I retrieve the results with:

#! /usr/bin/env bash
for file in *.log; do
    grep '^real' "$file" | awk -v file="$file" '{print file " -> " $2}'
done

… I run it on my legacy ThinkPad X230 on a French countryside internet connection, to display how it changes loading times in context where internet connection and computing power are precious resources, like in a heavy CI:

ghc8107-iog-fast.log -> 8m28.296s
ghc8107-iog-slow.log -> 11m28.658s
ghc8107-static-minimal-fast.log -> 2m35.712s
ghc8107-static-minimal-slow.log -> 5m22.060s
ghc962-iog-fast.log -> 7m26.853s
ghc962-iog-slow.log -> 11m9.047s
ghc962-static-minimal-fast.log -> 1m35.069s
ghc962-static-minimal-slow.log -> 4m48.812s

… should I run these benchmarks with other devx closure flavors or in other execution environments?

angerman commented 10 months ago

So it's about 50% of the total time. That's good! Thanks for running the benchmarks!