pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org/
Apache License 2.0
2.54k stars 258 forks source link

Remove `Pip.spawn_install_wheel` & optimize. #2305

Closed jsirois closed 10 months ago

jsirois commented 10 months ago

Now both the build time resolve code and the run time layout code use the same parallelization logic to install wheels using pex.pep_427 via a new pair of pex.jobs.{imap,map}_parallel functions.

Previously, both used pex.jobs.execute_parallel, which incurs a fork/exec per processed item along with the ensuing overhead of re-importing all the Pex code needed to do a pex.pep_427 wheel install. Although this makes sense for calling Pip, which shares no code with Pex, it is wasted effort to call pure Pex code. Although early experiments with parallelizing pex.pep_427 wheel installs with a thread pool showed pex.jobs.execute_parallel to perform consistently better, I never experimented with multiprocessing process-based pools. These perform better than both; and, in hindsight, for two obvious reasons:

  1. A process pool only incurs a fork once per pool slot. Job inputs are then fed by pipe; so no fork per every input is required as it is when using pex.jobs.execute_parallel. As a result, the import price is paid at most once per slot instead of once per job input.
  2. A process pool does not exec, at least on Linux; so all the imports done in the main process live on in the forked pool processes.
jsirois commented 10 months ago

Perf improvements

Build Time - for traditional installed wheel chroot PEXes

  1. Small:

    $ hyperfine \
        -w2 \
        -p 'rm -rf ~/.pex' \
        -n 'execute_parallel buildtime wheel chroot install' \
        -n 'imap_parallel buildtime wheel chroot install' \
        'pex --python python3.11 -D src -m main --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 -o cowsay.ep.pex' \
        'python3.11 -m pex -D src -m main --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 -o cowsay.mp.pex'
    Benchmark 1: execute_parallel buildtime wheel chroot install
      Time (mean ± σ):      3.217 s ±  0.018 s    [User: 3.087 s, System: 0.453 s]
      Range (min … max):    3.183 s …  3.238 s    10 runs
    
    Benchmark 2: imap_parallel buildtime wheel chroot install
      Time (mean ± σ):      1.866 s ±  0.010 s    [User: 1.653 s, System: 0.221 s]
      Range (min … max):    1.848 s …  1.877 s    10 runs
    
    Summary
      imap_parallel buildtime wheel chroot install ran
        1.72 ± 0.01 times faster than execute_parallel buildtime wheel chroot install
  2. Medium (compare imap_parallel / execute_parallel pairs):

    $ hyperfine \
        -w2 \
        -p 'rm -rf ~/.pex' \
        -n 'raw .whl build' \
        -n 'execute_parallel buildtime wheel chroot install' \
        -n 'imap_parallel buildtime wheel chroot install' \
        -n 'execute_parallel buildtime wheel chroot install --no-compress' \
        -n 'imap_parallel buildtime wheel chroot install --no-compress' \
        'python3.9 -m pex -c pants --no-pypi -f find-links --no-pre-install-wheels pantsbuild.pants==2.17.1 -o pants.whls.mp.pex' \
        'pex --python python3.9 -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 -o pants.ep.pex' \
        'python3.9 -m pex -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 -o pants.mp.pex' \
        'pex --python python3.9 -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 --no-compress -o pants.ep.nc.pex' \
        'python3.9 -m pex -c pants --no-pypi -f find-links pantsbuild.pants==2.17.1 --no-compress -o pants.mp.nc.pex'
    Benchmark 1: raw .whl build
      Time (mean ± σ):      1.680 s ±  0.011 s    [User: 1.408 s, System: 0.214 s]
      Range (min … max):    1.662 s …  1.693 s    10 runs
    
    Benchmark 2: execute_parallel buildtime wheel chroot install
      Time (mean ± σ):      9.265 s ±  0.048 s    [User: 12.653 s, System: 0.948 s]
      Range (min … max):    9.168 s …  9.339 s    10 runs
    
    Benchmark 3: imap_parallel buildtime wheel chroot install
      Time (mean ± σ):      7.117 s ±  0.032 s    [User: 6.967 s, System: 0.551 s]
      Range (min … max):    7.077 s …  7.183 s    10 runs
    
    Benchmark 4: execute_parallel buildtime wheel chroot install --no-compress
      Time (mean ± σ):      5.135 s ±  0.064 s    [User: 8.411 s, System: 1.015 s]
      Range (min … max):    5.071 s …  5.305 s    10 runs
    
    Benchmark 5: imap_parallel buildtime wheel chroot install --no-compress
      Time (mean ± σ):      3.067 s ±  0.017 s    [User: 2.816 s, System: 0.629 s]
      Range (min … max):    3.042 s …  3.097 s    10 runs
    
    Summary
      raw .whl build ran
        1.82 ± 0.02 times faster than imap_parallel buildtime wheel chroot install --no-compress
        3.06 ± 0.04 times faster than execute_parallel buildtime wheel chroot install --no-compress
        4.24 ± 0.03 times faster than imap_parallel buildtime wheel chroot install
        5.51 ± 0.05 times faster than execute_parallel buildtime wheel chroot install
    
    $ du -sh pants*.pex | sort -n
    52M     pants.whls.mp.pex
    53M     pants.ep.pex
    53M     pants.mp.pex
    239M    pants.ep.nc.pex
    239M    pants.mp.nc.pex

Runtime

  1. Small (imap_parallel is better, but parallelization is still a small loss):

    $ pex \
        --python python3.11 \
        -D src -m main \
        --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 \
        --no-pre-install-wheels -o cowsay.whls.ep.pex
    $ python3.11 -m pex \
        -D src -m main \
        --no-pypi -f find-links cowsay==5.0 ansicolors==1.1.8 \
        --no-pre-install-wheels -o cowsay.whls.mp.pex
    $ hyperfine \
        -w2 \
        -p 'rm -rf ~/.pex' \
        -n 'serial wheel chroot install' \
        -n 'execute_parallel runtime wheel chroot install' \
        -n 'imap_parallel runtime wheel chroot install' \
        './cowsay.whls.ep.pex' \
        'PEX_MAX_INSTALL_JOBS=0 ./cowsay.whls.ep.pex' \
        'PEX_MAX_INSTALL_JOBS=0 ./cowsay.whls.mp.pex'
    Benchmark 1: serial wheel chroot install
      Time (mean ± σ):     493.0 ms ±   3.4 ms    [User: 449.3 ms, System: 43.5 ms]
      Range (min … max):   488.7 ms … 498.1 ms    10 runs
    
    Benchmark 2: execute_parallel runtime wheel chroot install
      Time (mean ± σ):     574.4 ms ±   9.0 ms    [User: 589.3 ms, System: 69.7 ms]
      Range (min … max):   567.6 ms … 597.8 ms    10 runs
    
    Benchmark 3: imap_parallel runtime wheel chroot install
      Time (mean ± σ):     512.5 ms ±   3.0 ms    [User: 538.3 ms, System: 57.9 ms]
      Range (min … max):   508.9 ms … 518.0 ms    10 runs
    
    Summary
      serial wheel chroot install ran
        1.04 ± 0.01 times faster than imap_parallel runtime wheel chroot install
        1.17 ± 0.02 times faster than execute_parallel runtime wheel chroot install
  2. Medium:

    $ pex \
        --python python3.9 \
        -c pants \
        --no-pypi -f find-links pantsbuild.pants==2.17.1 \
        -o pants.ep.pex
    $ python3.9 -m pex \
        -c pants \
        --no-pypi -f find-links pantsbuild.pants==2.17.1 \
        -o pants.mp.pex
    $ pex \
        --python python3.9 \
        -c pants \
        --no-pypi -f find-links pantsbuild.pants==2.17.1 \
        --no-pre-install-wheels -o pants.whls.ep.pex
    $ python3.9 -m pex \
        -c pants \
        --no-pypi -f find-links pantsbuild.pants==2.17.1 \
        --no-pre-install-wheels -o pants.whls.mp.pex
    $ hyperfine \
        -w2 \
        -p 'rm -rf ~/.pex' \
        -n 'serial wheel chroot install' \
        -n 'serial .whl file install' \
        -n 'execute_parallel runtime wheel chroot install' \
        -n 'imap_parallel runtime wheel chroot install' \
        -n 'execute_parallel runtime .whl install' \
        -n 'imap_parallel runtime .whl install' \
        './pants.ep.pex -V' \
        './pants.whls.mp.pex -V' \
        'PEX_MAX_INSTALL_JOBS=0 ./pants.ep.pex -V' \
        'PEX_MAX_INSTALL_JOBS=0 ./pants.mp.pex -V' \
        'PEX_MAX_INSTALL_JOBS=0 ./pants.whls.ep.pex -V' \
        'PEX_MAX_INSTALL_JOBS=0 ./pants.whls.mp.pex -V'
    Benchmark 1: serial wheel chroot install
      Time (mean ± σ):      2.589 s ±  0.026 s    [User: 2.217 s, System: 0.241 s]
      Range (min … max):    2.551 s …  2.639 s    10 runs
    
    Benchmark 2: serial .whl file install
      Time (mean ± σ):      2.861 s ±  0.047 s    [User: 2.451 s, System: 0.274 s]
      Range (min … max):    2.804 s …  2.943 s    10 runs
    
    Benchmark 3: execute_parallel runtime wheel chroot install
      Time (mean ± σ):      2.814 s ±  0.029 s    [User: 5.343 s, System: 0.454 s]
      Range (min … max):    2.782 s …  2.888 s    10 runs
    
    Benchmark 4: imap_parallel runtime wheel chroot install
      Time (mean ± σ):      2.449 s ±  0.030 s    [User: 2.550 s, System: 0.274 s]
      Range (min … max):    2.408 s …  2.515 s    10 runs
    
    Benchmark 5: execute_parallel runtime .whl install
      Time (mean ± σ):      2.904 s ±  0.039 s    [User: 6.545 s, System: 0.618 s]
      Range (min … max):    2.860 s …  2.978 s    10 runs
    
    Benchmark 6: imap_parallel runtime .whl install
      Time (mean ± σ):      2.587 s ±  0.026 s    [User: 2.864 s, System: 0.344 s]
      Range (min … max):    2.555 s …  2.638 s    10 runs
    
    Summary
      imap_parallel runtime wheel chroot install ran
        1.06 ± 0.02 times faster than imap_parallel runtime .whl install
        1.06 ± 0.02 times faster than serial wheel chroot install
        1.15 ± 0.02 times faster than execute_parallel runtime wheel chroot install
        1.17 ± 0.02 times faster than serial .whl file install
        1.19 ± 0.02 times faster than execute_parallel runtime .whl install
benjyw commented 10 months ago

Looking now