Open zmanji opened 9 months ago
For the second line since installing wheels into a venv is just creating hardlinks ...
~Negative. When externalizing PEX never uses symlinks or hard links back into the PEX cache, it always copies. ... or it should. Digging a bit deeper here, but also, you appear to be missing updates from the last few releases that offer --no-pre-install-wheels
and --max-install-jobs
/ PEX_MAX_INSTALL_JOBS
to help deal with ridiculous distributions like Torch.~
I got pretty much all of that wrong!
Ok, for the last line pex: Installing 22 wheels in venv at ./tenv: 3585.3ms
- that is actually the time it takes to hardlink the site-packages tree. If I switch the symlink=False
here to True
I go from ~3.5s myself to ~30ms:
https://github.com/pantsbuild/pex/blob/a58848cbee4eff915a978e340c608fd9b50bfbb0/pex/cli/commands/venv.py#L311-L317
A quick experiment though shows that the times here are ~10x larger than they should be; so I need to drill into this aspect.
Next though is the pex: Building 0 artifacts and installing 22: 3094.9ms
line. That appears to be all due to re-hashing as part of this optimization:
https://github.com/pantsbuild/pex/blob/7a2cee3126d05408252c8d0b103adaf13eff8fe6/pex/resolver.py#L433-L500
So I think both of these can likely be optimized, but I won't be back at a keyboard to dive in until ~January 4th.
Ok, so it turns out both of these cases of ~3s are all consumed in hashing distribution files. So the time taken at least makes sense now. This is hashing ~4GB of files x2. The trick is to see if the hashing can be avoided or amortized or parallelized.
Ok, #2315 addresses "Building 0 artifacts and installing 22: 3142.4ms" and takes it to ~10ms. There is still "Installing 22 wheels in venv at ./tenv: 3585.3ms" to improve. The issue there is the same, re-hashing all files in an installed wheel chroot, but instead of doing that to get a single chroot hash, it's being done to create a compliant RECORD for the venv (for interoperability; e.g. so you can run pip uninstall X
in the venv) with a hashed / sized entry for each file in the wheel. I think I can just re-use the hash and size values from the original .whl
file RECORD but I need to think through the chain of custody / security implications a bit before charging ahead there. I'll leave this issue open for that work.
This is a question about pex's performance when creating virtualenvs. The example below is extracted from a Dockerfile where I am trying to understand if I can speed up the slowest step.
I created a lockfile for torch by running:
Then later I create venv by running:
This takes about 7 seconds on my machine even when all of the wheels have been downloaded and extracted.
The output shows:
I see two seconds that appear to take some time:
and
Adding more
-v
flags doesn't shed more light on this. Is there any obvious reason why these steps would take multiple seconds?For the first line I would expect that since the wheels are already installed to
.pex/installed_wheels
it should be a no-op.For the second line since installing wheels into a venv is just creating hardlinks from the
.pex/installed_wheels
directory and updating theRECORD
from the wheel I would expect it to be really fast.