Open huonw opened 1 month ago
I'm interested to try to implement this, and I imagine the implementation of export-subset
is good inspiration. I've found pex.cli.commands.Lock._export_pip
(and its callers) and pex.resolve.lockfile.subset
but it's not... obvious to me how to turn the Subset
/Resolved
etc. types back into a pex lockfile. So, a hint there would be very helpful.
Why does the existing export-subset, fed back in to pex -r ...
not suffice? The resulting requirements file has hashes, and the underlying pip download
call checks these; so is as good as a lock afaict. I, in fact, use this here:
A side note: you often reference commands like PEX_SCRIPT=pex3 pex ...
. This is extremely non-idiomatic / Pants specific. In almost all cases pex
will be the pex
console script, for which your command line does not work. It's only when pex
is the Pex PEX, that this style of command line makes sense. You are no doubt aware of this, but the issues read strange to be sure. Perhaps I should start publishing the Pex PEX as pex.pex
to help avoid this confusing overload.
Why does the existing export-subset, fed back in to pex -r ... not suffice? The resulting requirements file has hashes, and the underlying pip download call checks these; so is as good as a lock afaict. I, in fact, use this here:
Ah, if you think it works sufficiently well/captures everything it needs to, I'm happy to use that as official guidance.
I have a confirmation question: I note that export-subset
currently has a warning suggesting a lockfile with multiple locks isn't supported. This would suggest to me it doesn't cover all circumstances, but... maybe those circumstances don't matter in practice?
I can see this potentially not mattering for the Pants/caching use case, but I imagine it may matter for the "reduce a lockfile for a bug report" case?
This is extremely non-idiomatic
Ah, sorry. I like the all-in-one pex a lot: I have plopped it into my $PATH
and then use it to install random Python tools (unrelated to Pants). Much better than installing pex (or the other tools) into a global venv, and seems a bit silly to have two pexes on disk with different entry-points (and potentially different versions).
I acknowledge that it's a little confusing to have two slightly different meanings of the pex
name.
Perhaps I should start publishing the Pex PEX as pex.pex to help avoid this confusing overload.
Just brainstorming an additional option:
pex3
entrypoint able to do everything (e.g. add a new subcommand pex3 build $args
that does the same thing as pex $args
)pex3
pex with that console scriptIf this was supported, I'd personally just use pex3
for everything rather than switching between two styles, and I think issues mentioning pex3
would be unambiguous.
Ah, if you think it works sufficiently well/captures everything it needs to, I'm happy to use that as official guidance.
Yup, should work just fine and give all the same guarantees.
I have a confirmation question: I note that export-subset currently has a warning suggesting a lockfile with multiple locks isn't supported. This would suggest to me it doesn't cover all circumstances, but... maybe those circumstances don't matter in practice?
They don't matter in-practice for Pants, unless things have changed. Against my advice, Pants opted to use universal locks which include just 1 lock; so I embarked on a huge Pex code effort to support those. The case export-subset
doesn't support is the style of lock that is easy to create and I recommended from the start, which is 1 lock file with multiple single-platform locks within. It's that case, where there are multiple locks inside the lock file to pick from, that the comment refers to.
See https://github.com/pantsbuild/pants/issues/12458 for some background on picking the popular way of doing things (Poetry-style lock files at the time -> Pex --style universal
) vs. the right way.
Perhaps a better entry point / discussion: https://github.com/pantsbuild/pants/issues/12200 or https://github.com/pantsbuild/pants/issues/12568
Basically, there was a bit of a war waged around this and I lost and implemented the most complex, least secure thing - --style universal
.
@huonw if you find issues with the export-subset
-> pex -r ...
setup, let me know. The path you were pushing for here suffers from an impedance mismatch that you'd need to grok to add the feature. Namely, export-subset exports a lock for exactly 1 interpreter. What you would seem to want in the Pants case would be subsetting a universal lock to another (smaller) universal lock. That is a much simpler operation than exporting a subset for a specific interpreter which requires selecting which artifact is needed per locked requirement. To subset a lock to another lock, you take Pip's resolve logic as given-good, and so merely need to walk the dep graph of the input top-level subset requirements in the existing lock, ignoring environment markers and requires-python. As such, I think the implementation would be completely disjoint from export-subset
to start. To re-use would require breaking export-subset
logic into 2 parts - 1st subset the lock, then pick artifacts.
Ah, sorry. I like the all-in-one pex a lot: I have plopped it into my $PATH and then use it to install random Python tools (unrelated to Pants).
I use a user-local venv for this myself. It's a bit nicer than the Pex PEX IMO since I can change its version easily:
python -mvenv ~/bin/pex.venv
~/bin/pex.venv/bin/pip install pex
ln -s ~/bin/pex.venv/bin/pex ~/bin/pex
ln -s ~/bin/pex.venv/bin/pex3 ~/bin/pex3
ln -s ~/bin/pex.venv/bin/pex-tools ~/bin/pex-tools
And later:
~/bin/pex.venv/bin/pip install -U pex
Or:
~/bin/pex.venv/bin/pip install -U pex==<debug some old version>
I think the Pex PEX is only more convenient for Pants itself, which tries not to know anything about Python, and largely succeeds by being able to download a Pex "binary".
make the pex3 entrypoint able to do everything (e.g. add a new subcommand pex3 build $args that does the same thing as pex $args)
Yup. That has been exactly the plan.
Okay, I've had a chance to experiment a bit. Some observations:
--hash
with VCS requirements, and thus a PEX lockfile that happily contains a VCS requirement and PyPI requirements cannot have a subset that contains both of those be safely installed via requirements.txt
: need to drop the --hash
s on the PyPI requirements.That first one seems unresolvable with requirements.txt
?
Ah, yeah. That's right. I went through some hoops to support locks of both VCS requirements and local project directories, neither of which Pip's --hash
supports.
Ok then. My comment above (https://github.com/pex-tool/pex/issues/2411#issuecomment-2128509335) applies then. Let me know if you need more guidance or cry uncle.
Thanks, the prompt to should avoid following export-subset
is very helpful. (Not sure when I'll have a chance to look at it, though.)
Ok, I assigned you to help me keep track not to touch this.
A lockfile pins a potentially-huge universe of dependencies, and there's several use-cases where efficiently cutting that down to only a smaller applicable set would be handy:
This might be able to be implemented as a new
pex
value (or similar) forpex3 lock export-subset --format=...
To make this more concrete, a workflow might be:
cowsay
andtensorflow
, e.g.PEX_SCRIPT=pex3 pex lock create cowsay tensorflow -o test.lock
(contents at the end)pex
that uses onlytensorflow
:pex tensorflow --lock test.lock -o test.pex
, within some system that does process-based caching like Pants or Bazel. (Even with a warm PEX cache, this takes ~30s on my machine. I'm aware of the various settings that can improve this, but that's orthogonal to this feature request, I think.)cowsay
without changingtensorflow
or any of its dependent libraries.Currently, naive (aka reliable) entire-file-based caching of the process execution in step 2, will mean step 4's rerun has to execute and cannot be served from cache.
If we had the requested feature, the process runners could instead do two steps to build this PEX:
pex3 lock export-subset --format=pex --lock=test.lock tensorflow -o reduced.lock
(the--format=pip
version takes about 0.5s on my machine)pex tensorflow --lock reduced.lock -o test.pex
Under this scheme, when
cowsay
changes, thepex3 lock export-subset --format=pex
invocation is invalidated (its input has changed), and has to rerun... but thereduced.lock
output will be identical, and thuspex tensorflow --lock reduced.lock -o test.pex
can be served from cache. Theexport-subset
invocation is very fast in comparison to the full build, and thus this feature would unlock more efficient use of PEX.