pex-tool / pex

A tool for generating .pex (Python EXecutable) files, lock files and venvs.
https://docs.pex-tool.org
Apache License 2.0
2.49k stars 254 forks source link

Add support for PEP-723 script metadata to `--exe`. #2436

Closed jsirois closed 2 weeks ago

jsirois commented 2 weeks ago

Pex now reads script metadata present in --exes by default to augment requirements and constrain allowed target Pythons. Pex can be instructed to ignore this script metadata though with --no-pep723 / --no-script-metadata.

This apparently makes due on a promise no active Pex maintainer ever made, as immortalized in PEP-723: https://peps.python.org/pep-0723/#tooling-buy-in

jsirois commented 2 weeks ago

Ok, 94b1e2c was a big addition:

:; git diff --stat 94b1e2c03a7fd9c49f479b2885a294e7ac00b519^!
 pex/pep_723.py        | 237 +++++++++++++++++++++++++++++++++++++++++++------
 tests/test_pep_723.py | 240 +++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 447 insertions(+), 30 deletions(-)

That's the price you pay when not normal users, but programmers (who presumably know how to debug), lose their debugging skills when the code is not theirs. I'm obviously grumpy about this state of the world, but I can play along.

benjyw commented 2 weeks ago

I am working my way through this while reading the PEP. Will respond hopefully later today, europe time.

jsirois commented 2 weeks ago

Is that the case you noticed?

Yes. I had written the test for the over-abundant script block case and it was failing to see >1 block.

The regex did already work for the greedy case; i.e. the case where the spec leads into an embedded c-sharp comment example with "Precedence for an ending line # /// is given when the next line is not a valid embedded content line as described above.". It just didn't work for multiple blocks.

jsirois commented 2 weeks ago

Thanks for reading through all this @benjyw.

Anyway, I share your grumpiness.

Multiple levels here.

ofek commented 2 weeks ago

Hey there, thank you for taking the time to implement this! I have two questions:

  1. Could you please explain a bit further how the regular expression here breaks? https://packaging.python.org/en/latest/specifications/inline-script-metadata/#specification
  2. Is there anything I could have done differently? I feel bad that you folks aren't very happy.
jsirois commented 2 weeks ago

For 1, given this example.py:

import re
from textwrap import dedent

REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'

def collect_metadata_blocks(text):
    return list(re.finditer(REGEX, text))

EXAMPLE1 = dedent(
    """\
    # /// script
    # ///
    # /// future
    # ///
    """
)

EXAMPLE2 = dedent(
    """\
    # /// script
    # dependencies = ["ansicolors"]
    # ///
    #
    # /// future
    # who_knows = 42
    # ///
    """
)

example1 = collect_metadata_blocks(EXAMPLE1)
print(example1)
for index, match in enumerate(example1):
    print(index, match.group("type"), match.group("content"))

example2 = collect_metadata_blocks(EXAMPLE2)
print(example2)
for index, match in enumerate(example2):
    print(index, match.group("type"), match.group("content"))

You find:

:; python example.py
[<re.Match object; span=(0, 38), match='# /// script\n# ///\n# /// future \n# ///'>]
0 script # ///
# /// future

[<re.Match object; span=(0, 88), match='# /// script\n# dependencies = ["ansicolors"]\n# >]
0 script # dependencies = ["ansicolors"]
# ///
#
# /// future
# who_knows = 42

So the re is too greedy for practical use afaict. It forces you to write 2 script blocks in separate comments (a non "#", non "# "* line is required between them. The parser I hand-rolled always breaks a block that's seen a # /// when a new block starts since having a block start inside another block is called out as invalid by the spec. Note that the re happily accepts those here by its definition of a script block.

For 2, I don't think you were the problem here.

jsirois commented 2 weeks ago

Belatedly I just realized this is an almost useful thing now:

#!/usr/bin/env -S pex --exe
# /// script
# ...
# ///
...

So if you know your target environment has recent enough Pex installed on the path. instead of scp'ing a PEX file, you scp just the script. Kind of a party trick though since its very slow to start up that way.