matthew-brett / delocate

Find and copy needed dynamic libraries into python wheels
BSD 2-Clause "Simplified" License
262 stars 59 forks source link

Validate and update macOS platform tag in wheel name #198

Closed Czaki closed 4 months ago

Czaki commented 4 months ago

This PR introduces scanning all binaries in the wheel and checking if the platform tag in the wheel name is correct in the context of embedded binaries.

By default, it updates the wheel name. If --require-target-macos-version is provided or MACOSX_DEPLOYMENT_TARGET is provided, then delocate fails if the wheel does not meet versions provided this way.

Czaki commented 4 months ago

I will add tests later, but if you could clarify how it will best look for you the test fail.

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (eaba43d) 96.45% compared to head (51f03fc) 96.80%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #198 +/- ## ========================================== + Coverage 96.45% 96.80% +0.35% ========================================== Files 15 15 Lines 1184 1285 +101 ========================================== + Hits 1142 1244 +102 + Misses 42 41 -1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

HexDecimal commented 4 months ago

Minor issue with code coverage, simply re-running the tests solved it. Or were you asking for advice on writing tests?

Czaki commented 4 months ago

Or were you asking for advice on writing tests?

No. I do not want to start writings test before clarifying the interface.

Czaki commented 4 months ago

One additional question. If both checks should be optional or some should be default behavior with option to disable?

HexDecimal commented 4 months ago

One additional question. If both checks should be optional or some should be default behavior with option to disable?

If you made this not optional then that would simplify the code. There would be no need to verify the versions anymore if this always sets the version correctly. No need for the additional CLI flags either. I can't imagine use cases where anyone would want to keep the wrong version intentionally, but feel free to correct me.

I looked at issue #56 but it appears you're the one who made that issue. I don't have enough MacOS experience to add much additional insight. Maybe reduce the checks suggested here to only --require-target-macos-version <max_version> which errors if an upper bound is exceeded. I'll assume this PR closes #56.

To clarify what I'm suggesting:

If you want an error message format, how about this:

Library dependencies do not satisfy target MacOS version {require_target_macos_version}:
{lib_path} has a minimum target of {lib_macos_version}
...  # repeat for each failed library
Czaki commented 4 months ago

I have applied requested changes. I have added the option to activate require-target-macos-version by setting MACOSX_DEPLOYMENT_TARGET environment variable.

Should I propose changes in README about new error?

EDIT. If wheel name does not fit the proper pattern, should run fail (and I need to fix all tests) or it should skip name validation then?

HexDecimal commented 4 months ago

If wheel name does not fit the proper pattern, should run fail (and I need to fix all tests) or it should skip name validation then?

Wheel names must be valid after renaming, there are PEP's for what is a valid name. Wheel inputs should be valid enough for tools to work. I'll look into the test errors and the wheel spec to figure out why tests are currently failing.

Should I propose changes in README about new error?

The most important place is the CHANGELOG. It should note the new CLI flag and that wheels might be tagged with a different MacOS version than before. It should also note that new ENV variables will be read.

You can also explain these in the README, but the CHANGELOG is more important since there needs to be an easy to read record of when these changes have happened.

HexDecimal commented 4 months ago

It seems that current tests assume wheels can be given arbitrary names when used with Delocate, and your new code assumes wheels have a full valid name to work with to pull tags from the filename. I think your code renaming files has broken the assumption made by Delocate that wheels will always have the same name before and after delocation and that the actual name doesn't matter.

After some consideration I think Delocate is wrong to have this assumption, and I think it's safe to force wheels to have valid input names. This will require rewriting some tests. This might break the tool for users abusing wheel names, but regular users will be unaffected. Stricter inputs will need to be mentioned in the CHANGELOG.

Wheels have tag information included in the wheel metadata, this can be read to get the tags of a wheel with an invalid name. I assume it must be updated when the tags of a wheel are changed as well. The metadata location is based on a valid wheel name itself so it isn't an alternative to a valid name.

More info: https://packaging.python.org/en/latest/specifications/binary-distribution-format/

I think Delocate might be breaking several packaging rules, but I'd have to take a longer look to know for sure how bad it is.

If wheel name does not fit the proper pattern, should run fail (and I need to fix all tests) or it should skip name validation then?

With all this in mind, the current tests are broken and need to be fixed by giving temporary wheels valid names (or by putting the copied wheels into their own temporary directories so they that don't conflict with each other.)

Czaki commented 4 months ago

Ok. I have added tests and changed behavior to requested.

I do not have an idea how properly test test_check_plat_archs (failing currently).

As the parser is defined globaly in delocate.cmd.delocate_wheel it is not easy to run test_delocate_wheel_verify_name_universal2_verify_crash with MACOSX_DEPLOYMENT_TARGET environment variable. It could be fixed by moving the parser definition to a function or reload module using importlib.reload.

HexDecimal commented 4 months ago

test_delocate_wheel_verify_name_universal2_verify_crash with MACOSX_DEPLOYMENT_TARGET environment variable. It could be fixed by moving the parser definition to a function or reload module using importlib.reload.

pytest-console-scripts has a setting to handle this case:

@pytest.mark.script_launch_mode('subprocess')
def test_delocate_wheel_verify_name_universal2_verify_crash(...

This will cause script_runner.run to run in a new process, causing the import-time code to run again. Monkey patches will no longer work, so make a copy with os.environ.copy(), then set the tested variable, then pass the modified environment to script_runner.run(env=)

HexDecimal commented 4 months ago

I do not have an idea how properly test test_check_plat_archs

I suspect there's nothing wrong with test_check_plat_archs. It seems _calculate_minimum_wheel_name is having trouble with fakepkg1-1.0-cp37-abi3-macosx_10_9_universal2.whl. It looks like _get_macos_min_version doesn't return arm64, then the function crashes later when it checks for arm64 on the universal2 wheel.

It's possible that _get_macos_min_version and _get_archs_and_version_from_wheel_name will need their own tests to ensure they work as documented.

Czaki commented 4 months ago

I suspect there's nothing wrong with test_check_plat_archs. It seems _calculate_minimum_wheel_name is having trouble with fakepkg1-1.0-cp37-abi3-macosx_10_9_universal2.whl. It looks like _get_macos_min_version doesn't return arm64, then the function crashes later when it checks for arm64 on the universal2 wheel.

Yes. Because in this test, the arm64 binaries were removed from the wheel. The question is how to fix this. I need to add some additional binaries to the default wheel? As I expect that delocate should fail if there is no binary for given architecture.

HexDecimal commented 4 months ago

Yes. Because in this test, the arm64 binaries were removed from the wheel. The question is how to fix this. I need to add some additional binaries to the default wheel? As I expect that delocate should fail if there is no binary for given architecture.

Thanks for clarifying, I misunderstood what I was looking for.

This test appears to be for the require_archs parameter of delocate_wheel (also the --require-archs flag). With the failing code running require_archs=(). The relevant documentation for this argument:

An empty sequence results in checks that depended libraries have the same archs as depending libraries.

So this ignores the wheels tags and verifies the bundled libraries as-is. The old code handles wheels with invalid architecture combinations without fail, but your code is more strict and crashes when a wheel doesn't contain the correct architectures based on its tag. I prefer the strictness, but it would've been nice if the error was better than KeyError.

This test has the terribly documented _fix_break and _fix_break_fix sub-functions. These can be updated to return renamed wheels with the correct arch tags so that they won't be invalid anymore. This is a small and easy change compared to any other idea I can think of.

Czaki commented 4 months ago

Looks ready for review.

Czaki commented 4 months ago

I assume the tags in {distribution}-{version}.dist-info/WHEEL were not updated yet. It is probably incorrect to leave them unchanged.

I miss this. Earlier. I have fixed it now.

I think this currently leaves the original file alone if the name is changed. This will mess up the distribution and deployment of most projects. If this renames a wheel then the original must be deleted (if it's in the same directory).

You were right. It is fixed now.