pypa / installer

A low-level library for installing from a Python wheel distribution.
https://installer.readthedocs.io/
MIT License
123 stars 51 forks source link

Add wheel file validation for entry_points.txt #198

Open crbeckle opened 10 months ago

crbeckle commented 10 months ago

This is a feature request to support additional Wheel file validation beyond the RECORD file, and specifically for the entry_points.txt file.

Rationale: I have seen cases where developers get confused about package names in their entry point scripts, which then leads to confusing errors during package install. For example, given an install via the pdm tool with the following dummy pyproject.toml snippet:

[project.scripts]
my-script =  "my-project.app:main"

Assuming the error here is that the my-project portion of the module description is actually a module named my_project, the following error occurs:

Traceback (most recent call last):
  ...
  File "/home/user/.local/share/pdm/venv/lib64/python3.9/site-packages/installer/_core.py", line 86, in install
    for name, module, attr, section in parse_entrypoints(entrypoints_text):
  File "/home/user/.local/share/pdm/venv/lib64/python3.9/site-packages/installer/utils.py", line 235, in parse_entrypoints
    assert match
AssertionError

The installer.utils.parse_entrypoints does perform validation in terms of a regex match of the entry script module value, but failure of a match results in a cryptic assertion error.

Proposal: After looking some through this code base (as well as the pdm code base before this one), I believe a reasonable and backwards compatible solution would be as follows:

  1. Add a new API function to installer.sources.WheelSource with a signature of validate_entry_points(self) -> None.
  2. Implement the function in installer.sources.WheelFile by extracting out the assertions that happen in installer.utils.parse_entrypoints and raising _WheelFileValidationError's instead of doing assertions (perhaps this could be pulled out to another utility function for reuse?). Of course, before performing these validations, a check would have to be made for the existence of the entry_points.txt file, and no validations would be performed if the file does not exist.
  3. Perform the validation in the parse_entrypoints utility function as it is now (reusing extracted validation code, ideally), but resulting in an assertion error to keep the current functionality.

This would maintain existing functionality of this library while adding the new validate_entry_points function for users of the library.

@frostming - if this proposal is accepted and implemented, I would recommend adding the new validation in pdm.installers.installers._install_wheel to avoid hitting the confusing assertion error.

frostming commented 10 months ago

Raising an exception when the regex matching fails is sufficient, we don't need a standalone function to do that. It's different from record validation because when it doesn't match the pattern, the entry point is invalid and it's better to abort.

crbeckle commented 10 months ago

Raising an exception when the regex matching fails is sufficient, we don't need a standalone function to do that. It's different from record validation because when it doesn't match the pattern, the entry point is invalid and it's better to abort.

As in raising an exception in the existing function that is more specific than an AssertionError? That works for me as long as it is more clear what the problem is to the user in the error message. I am imagining something similar to the message printed when a RECORD file is invalid...

"Invalid entry point in {wheel}. Please report to the maintainers of that package so they can fix their project setup. Details: \n{formatted_issues}\n"

... followed by aborting the install. That, of course would be up to users of this library to catch and print / re-throw (i.e. pdm), and the only change here would be the more specific error to be raised.

tuergeist commented 9 months ago

I ran into that error after renaming the package. However, assert match AssertionError is nothing a user can work with. I had to read through the code to get what's the problem here.