lmstudio-ai / venvstacks

Virtual environment stacks for Python
https://lmstudio-ai.github.io/venvstacks/
MIT License
1 stars 0 forks source link

Consider adding internal archive manifest #17

Open ncoghlan opened 1 month ago

ncoghlan commented 1 month ago

Python's wheel format (and package installation records in general), support recording full internal archive manifests, along with the expected hashes of included files. That internal manifest can optionally be signed with a JSON web signature (although publicly available wheel files almost never do so - the feature is more intended for privately built wheel archives with very specific deployment environments):

venvstacks intentionally removes these RECORD files, mostly for reproducibility reasons (since some of the hashes may relate to files that contain absolute paths to the build environment), but also to make it less likely regular Python package management tools will attempt to manipulate the environment contents.

To replace these removed files, venvstacks could create its own installation manifest at share/venv/metadata/RECORD.

To minimise the RECORD file size, an adjacent JSON file would be used to specify the relative base path for record entries (since base runtime environments would want to use the root folder, while layered environments would want to use the site-packages folder).

ncoghlan commented 1 week ago

Note that even if #28 means that the original RECORD files remain mostly intact, there are still additional files in the published archives that those files don't capture (like the injected postinstall.py script and sitecustomize.py module).

However, keeping the original RECORD files would mean that the archive level RECORD could just store the hashes for those files, rather than repeating all the individual file hashes for the distribution package contents.