Deleting distribution RECORD files entirely was the simplest way to resolve the issues they can cause with archive reproducibility: when they contain references to script files with shebangs that are rewritten at installation time, their contents are implicitly dependent on the absolute path to the build environment (since the shebang line gets rewritten).
An improved approach would be to delete just the lines corresponding to the deleted files, rather than deleting the entire RECORD file (as the current approach means that features like importlib.metadata.packages_distributions won't work in deployed environments).
Deleting distribution RECORD files entirely was the simplest way to resolve the issues they can cause with archive reproducibility: when they contain references to script files with shebangs that are rewritten at installation time, their contents are implicitly dependent on the absolute path to the build environment (since the shebang line gets rewritten).
An improved approach would be to delete just the lines corresponding to the deleted files, rather than deleting the entire RECORD file (as the current approach means that features like
importlib.metadata.packages_distributions
won't work in deployed environments).To implement this approach,
importlib.metadata
can be used to get a complete list of every file belonging to every distribution in an environment, and thecsv
module can be used to edit the RECORD files (alternatively, for build time usage, the installer project offers a higher level interface for handling RECORD file updates in https://installer.pypa.io/en/latest/api/utils/#installer.utils.construct_record_file and https://installer.pypa.io/en/latest/api/records/).