Add support for PEP-723 script metadata to `--exe`.

jsirois commented 2 weeks ago

Pex now reads script metadata present in --exes by default to augment requirements and constrain allowed target Pythons. Pex can be instructed to ignore this script metadata though with --no-pep723 / --no-script-metadata.

This apparently makes due on a promise no active Pex maintainer ever made, as immortalized in PEP-723: https://peps.python.org/pep-0723/#tooling-buy-in

jsirois commented 2 weeks ago

Ok, 94b1e2c was a big addition:

:; git diff --stat 94b1e2c03a7fd9c49f479b2885a294e7ac00b519^!
 pex/pep_723.py        | 237 +++++++++++++++++++++++++++++++++++++++++++------
 tests/test_pep_723.py | 240 +++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 447 insertions(+), 30 deletions(-)

That's the price you pay when not normal users, but programmers (who presumably know how to debug), lose their debugging skills when the code is not theirs. I'm obviously grumpy about this state of the world, but I can play along.

benjyw commented 2 weeks ago

I am working my way through this while reading the PEP. Will respond hopefully later today, europe time.

jsirois commented 2 weeks ago

Is that the case you noticed?

Yes. I had written the test for the over-abundant script block case and it was failing to see >1 block.

The regex did already work for the greedy case; i.e. the case where the spec leads into an embedded c-sharp comment example with "Precedence for an ending line # /// is given when the next line is not a valid embedded content line as described above.". It just didn't work for multiple blocks.

jsirois commented 2 weeks ago

Thanks for reading through all this @benjyw.

Anyway, I share your grumpiness.

Multiple levels here.

Josh was out of line, never consulting before or after: I just stumbled upon his promise on Pex's behalf last week.
Putting alot of effort into error messages when the audience is the same group I'm in, and that audience should be able to read / debug better.
Python / PyPA specs that require using non-stdlib and not even PyPA packages in non-EOL'd Pythons.

ofek commented 2 weeks ago

Hey there, thank you for taking the time to implement this! I have two questions:

Could you please explain a bit further how the regular expression here breaks? https://packaging.python.org/en/latest/specifications/inline-script-metadata/#specification
Is there anything I could have done differently? I feel bad that you folks aren't very happy.

jsirois commented 2 weeks ago

For 1, given this example.py:

import re
from textwrap import dedent

REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

def collect_metadata_blocks(text):
    return list(re.finditer(REGEX, text))

EXAMPLE1 = dedent(
    """\
    # /// script
    # ///
    # /// future
    # ///
    """
)

EXAMPLE2 = dedent(
    """\
    # /// script
    # dependencies = ["ansicolors"]
    # ///
    #
    # /// future
    # who_knows = 42
    # ///
    """
)

example1 = collect_metadata_blocks(EXAMPLE1)
print(example1)
for index, match in enumerate(example1):
    print(index, match.group("type"), match.group("content"))

example2 = collect_metadata_blocks(EXAMPLE2)
print(example2)
for index, match in enumerate(example2):
    print(index, match.group("type"), match.group("content"))

You find:

:; python example.py
[<re.Match object; span=(0, 38), match='# /// script\n# ///\n# /// future \n# ///'>]
0 script # ///
# /// future

[<re.Match object; span=(0, 88), match='# /// script\n# dependencies = ["ansicolors"]\n# >]
0 script # dependencies = ["ansicolors"]
# ///
#
# /// future
# who_knows = 42

So the re is too greedy for practical use afaict. It forces you to write 2 script blocks in separate comments (a non "#", non "# "* line is required between them. The parser I hand-rolled always breaks a block that's seen a # /// when a new block starts since having a block start inside another block is called out as invalid by the spec. Note that the re happily accepts those here by its definition of a script block.

For 2, I don't think you were the problem here.

Josh misrepresented himself as being able to speak for Pex / never relayed he did this to an active Pex maintainer. Clearly not your problem. You would have had to have checked with all the projects that promised support to verify those speaking were in fact valid representatives for those projects. I think the default in the PyPA / Python communities is trust.
The TOML choice was effectively sealed in by PEP-518. The fact that Python / the PyPA adopt standards that ~require (I certainly don't want to write a toml parser just to implement PEPs that support at least Python >=3.8,<3.11) code outside either's control for non-deprecated Python versions seems ridiculous and dangerous to me on its face. Perhaps PyPA tried and failed to adopt toml or tomli / tomli-w and deemed splitting the community further by writing their own under https://github.com/PyPA/ was undesirable or that it was simply too much effort.

jsirois commented 2 weeks ago

Belatedly I just realized this is an almost useful thing now:

#!/usr/bin/env -S pex --exe
# /// script
# ...
# ///
...

So if you know your target environment has recent enough Pex installed on the path. instead of scp'ing a PEX file, you scp just the script. Kind of a party trick though since its very slow to start up that way.

pex-tool / pex

Add support for PEP-723 script metadata to `--exe`. #2436