yarnpkg / berry

📦🐈 Active development trunk for Yarn ⚒
https://yarnpkg.com
BSD 2-Clause "Simplified" License
7.37k stars 1.1k forks source link

[Bug?]: custom yarn checksum is bad for file sharing #6296

Closed milahu closed 4 months ago

milahu commented 4 months ago

Self-service

Describe the bug

yarn 2 introduced a custom checksum field in yarn.lock but the checksum is not the sha512 hash ("integrity") of the original downloaded tgz file

problem: the custom yarn checksum is hard to reproduce and validate see also Yarn v2+ lockfile, how the validate new checksum

this is problematic for tools like npmlock2nix that want to use the yarn.lock file to download tgz files (and git commits) and then run yarn in offline mode, to build a node_modules tree because such tools need the integrity values of downloaded files

when such tools cannot use yarn.lock then they need to invent their own lockfiles which means: download all tgz files to get their sha integrity

custom yarn checksum is bad for file sharing

"file sharing" as in: different nodejs package managers (npm, pnpm, yarn) use the same tgz files to build their node_modules trees with nix, all these tgz files are cached in /nix/store/

so now when yarn introduces a new archive format to repack the original tgz files into yarn zip files then the /nix/store/ has tgz files and zip files for the same node packages (and even different zip files with different compression levels... #6068) and these different files cannot be shared between different package managers bottom line: more disk space is used

so ...

at least more documentation would be nice (i found zero) why the custom checksum? why not use sha512 integrity of tgz files? why not cache the original tgz files? (does yarn cache the zip files?)

in my first impression, this looks like a bad tradeoff. what exactly is the benefit of the custom yarn checksum?

(feel free to move this to a discussion)

To reproduce

.

Environment

yarn 2.x to yarn 4.2.2

Additional context

similar issues #6105 #5136