Open asottile-sentry opened 2 years ago
Missing a trailing ')' in your re, I assume after the closing "}".
_INTERPRETER_RE = re.compile(fr'^(?:{"|".join(sorted(INTERPRETER_SHORT_NAMES.values()))})\d+')
If you are building an re of values from some other source, sorting by descending length will be more likely to give you a unique match, in the case where some short value is a prefix of some longer value.
...sorted(INTERPRETER_SHORT_NAMES.values(), key=len, reverse=True)...
Though probably not an issue in this small group of values (after looking them up in the code).
Unfortunately the best we could do based on my reading of the spec (obviously let me know if I missed something) is test whether the name is normalized as required by https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode . Otherwise the interpreter short names are not guaranteed to be the only valid interpreter names, just convenient ones (e.g. Pyston is not covered by that list). We can't make any assumptions as to what interpreter name is considered valid, and so 2.7.6
as a compressed tag of silly interpreter names is still valid.
I can totally see having a higher-level function in another library that requires expected interpreter tags, but as such a low-level library I don't think we can be that pragmatic about what is usually expected and have to stick to the spec in this instance.
PEP 425 adds more color: https://peps.python.org/pep-0425/#python-tag -- that it should be implementation name and then version (or version with an underscore) -- so I believe \w+[0-9][0-9_]*
would be a required pattern for a tag
@brettcannon should this be reconsidered given above?
I'm okay with re-opening it, but whether this is a feature request or a bug I'm still torn about. For instance, while PEP 425 says "The version is py_version_nodot," that's entirely CPython-specific as to what that means. As such, do we get to say that it must follow https://peps.python.org/pep-0440/#version-scheme ? What about trailing underscores like your regex allows?
As well, "Other Python implementations should use sys.implementation.name" puts no actual restriction on the format of the name, i.e. there's nothing stopping it from starting with a number. Since the Python interpreter tag is not designed to be parsed but to be an opaque string for code to compare against while still being human-readable makes all of this rather murky.
"Other Python implementations should use sys.implementation.name" puts no actual restriction on the format of the name
It actually does! From https://peps.python.org/pep-0421/:
name A lower-case identifier representing the implementation. Examples include ‘pypy’, ‘jython’, ‘ironpython’, and ‘cpython’.
Based on the discussion so far, I reckon requiring it to either be a f"{shortname}{numbers}"
for a shortname we recognise, or an identifier is a reasonable restriction.
I'm looking to migrate a caller from utilizing
distlib.wheel
to instead usepackaging.utils.parse_wheel_filename
and my tests encountered an edge case which isn't handled by thepackaging
versionhere's the distlib code: https://github.com/pypa/distlib/blob/91aa92e64ee5d5005901907eaa340a72f18d7212/distlib/wheel.py#L83
packaging's
parse_tag
doesn't validate the interpreter in any way -- whereas distlib required a particular structurethe test case in question: https://github.com/chriskuehl/dumb-pypi/blob/b3fc62c6519bdb84960d2b491b40ad065fb79a47/tests/main_test.py#L61
playlyfe-0.1.1-2.7.6-none-any.whl
should not parse as a valid wheel, howeverpackaging
allows it (and parses as three separate tag-triplets:2-none-any
,7-none-any
,6-none-any
)I propose adding a little bit of validation to
parse_tag
to check that the tags are supported interpreter versionsI believe this involves something like:
and then utilizing that regex to validate the names (and throwing an appropriate exception) -- I think a new
InvalidTag
(extendingValueError
) or somesuch exception would need to be created as well (similar to the ones inpackaging.utils
for the wheel / stdist filename)wanted to propose this first before diving in and implementing it to save time -- thoughts?