Open blaggacao opened 2 years ago
This pigen-holing result nicely illustrates the problem:
layers.json> INFO[0000] Excluding path /nix/store/262ksdbkjjaqnlnkl99fk1mj8rnnki1h-config.json from layer
layers.json> INFO[0000] Adding 1 paths to layer (size:1638912 digest:sha256:b7a4dcedd0c8d79e869bf93cd6e9cc6149bb44afbed77fd75fd9ffc7e58d2297)
layers.json> INFO[0000] Adding 1 paths to layer (size:294912 digest:sha256:2e894008e13387cdf1922c9fbff6d1610a8aefa90182730c57dc3260b5151b62)
layers.json> INFO[0000] Adding 1 paths to layer (size:31911936 digest:sha256:876dd3e223a5eb6e3563f14e7fceb92bfc817f6f6d5a2be257ec85e56125fb0a)
layers.json> INFO[0000] Adding 1 paths to layer (size:22016 digest:sha256:12a71198133584c7662d8b730a8cd369354a61819ebe4615e930f470c5a1fea4)
layers.json> INFO[0000] Adding 1 paths to layer (size:39424 digest:sha256:9eba9f4a5227b91efd50eb1635264f27516005d315a42d4cd74eb4bb102fdbcb)
layers.json> INFO[0000] Adding 1 paths to layer (size:12800 digest:sha256:9b9a2c19c608692e500ee981195f3decd7cd8d70d6ba35cd2b63c3b662518a67)
layers.json> INFO[0000] Adding 1 paths to layer (size:1577984 digest:sha256:671a37be3baaee470579d6564904f6266d8783d1b9489da44c5ebdef272cd60d)
layers.json> INFO[0000] Adding 1 paths to layer (size:32768 digest:sha256:c044f39fdd3121001dd61f4ea90c81df78bcbfafcce42ec3f50423c77a84745f)
layers.json> INFO[0000] Adding 1 paths to layer (size:12800 digest:sha256:180427ccd9b71a9ca0eb30211e2e3ee5540eb62de44c76b580b1e9abfbc3813f)
layers.json> INFO[0000] Adding 1 paths to layer (size:20480 digest:sha256:2708591fa65fc626ab4f2bb021200933988c0e53dbfee2df6a81eb8f2b67fd91)
layers.json> INFO[0000] Adding 1 paths to layer (size:17920 digest:sha256:903d9967c085f68971bc78d4660aa25231d4cfabee5a86bbf722ba0b22bd9581)
layers.json> INFO[0000] Adding 1 paths to layer (size:6024192 digest:sha256:afa3323460ccd4f837288fca18aa095fe89a346de778e473aaaf12c05048b54c)
layers.json> INFO[0000] Adding 1 paths to layer (size:130560 digest:sha256:37d7bae7417ee420371aeeb8e9da67ce6056e53b48c16682849a41be536261b3)
layers.json> INFO[0000] Adding 1 paths to layer (size:99840 digest:sha256:d4db7e7c4446499533d0309bbb8dc829302ea5665aeb17b4b427e00397d6e1b1)
layers.json> INFO[0000] Adding 1 paths to layer (size:4194816 digest:sha256:ca612138a67c611ea2b507dc0cc208075a0e5286a3df3c0d8d2b426416fc29da)
layers.json> INFO[0000] Adding 1 paths to layer (size:13312 digest:sha256:b238b571092c1830fdca9ccc75ecb21b8e490b14bac3c099eb397b4c23eebb8a)
layers.json> INFO[0000] Adding 1 paths to layer (size:25600 digest:sha256:9be5168fda052fae749a77b933640ac146b86692a8542c1c47f0d178b968495c)
layers.json> INFO[0000] Adding 1 paths to layer (size:44544 digest:sha256:c9815662836c485f2ea29d9bde5dd1ba1e02eeab5403959cea6d9f171680f829)
layers.json> INFO[0000] Adding 1 paths to layer (size:126976 digest:sha256:8406f88b2358240cdf4c06e4c9e70967d1d09728a962668b9e464ab12542328d)
layers.json> INFO[0000] Adding 1 paths to layer (size:1915392 digest:sha256:a104d0a007943b0436a04173b77b7c15a2c072dee56300e020696be066e5b1d6)
layers.json> INFO[0000] Adding 1 paths to layer (size:5048320 digest:sha256:ba170d80b18d8696e5dcbaad0559968b2028eeb8bda54565f388d9b3cb07e881)
layers.json> INFO[0000] Adding 1 paths to layer (size:36811776 digest:sha256:069f3cbfaac7f396b7d06267629a141013f0999cecd0e84ad8b9423d9bdd8c4b)
layers.json> INFO[0000] Adding 1 paths to layer (size:1135104 digest:sha256:655510aec3034575c467ef1555ef70639ec47dca0ef64d85e971d1e1b91632a5)
layers.json> INFO[0000] Adding 1 paths to layer (size:23040 digest:sha256:bbc99e64c34fd12ec1e318d2ccd9d3be1edabe004897fb4cf56ca3540ada41ac)
layers.json> INFO[0000] Adding 1 paths to layer (size:118784 digest:sha256:a8c57acc3a6f805524005a4909d8a5f8a5fdb56530ab755726ff58963c325f07)
layers.json> INFO[0001] Adding 1 paths to layer (size:56752128 digest:sha256:22ed0a8b0fd15f7eb623a86d513a93722e4014e3c893c38e959bdfac18e18ee8)
layers.json> INFO[0001] Adding 1 paths to layer (size:1439232 digest:sha256:d88ec783ebcf309f08dc5ce4cad23e5f079a757402302c725bb0a7c20a084fc5)
layers.json> INFO[0001] Adding 1 paths to layer (size:86016 digest:sha256:7a10574549be837c61cdf7a8b52efd2c216bc932bd978de3609f0c71a1ff21b7)
layers.json> INFO[0001] Adding 1 paths to layer (size:509952 digest:sha256:7fce9ccdebbd0aee0046876da7ba12f4bf103422588a6b3e650fa10d7b8a6312)
layers.json> INFO[0038] Adding 730 paths to layer (size:1258515456 digest:sha256:cf5224870e23d0d28caad665e579946d35f65845d9b598fdb54bd4ab0807de87)
The corresponding popularity of the above pigeon-holing result:
In the case that we can get a handle on the original source package (excluding container runtime environment like for example required by the entry point), then the volatility is usually equal to the least popular layer (=the source package).
This makes for a pretty simple fix, namely: we could introduce a volatility cut-off that defaults to one path, so that the last layer is always equal to the number of paths of that volatility cutoff.
After running for a while with a high maxLayer
parameter, I realize that there is another aspect that slows the overall round-trip time down quite a lot: size-tradeoff.
It may not make a lot of sense to split individual layers who are smaller than 5MB and cluster layers up to that threshold.
Or else, a layered upload takes ages (for a total of deduplication in the rango of 10MBs in one case).
I totally agree with the size-tradeoff (I had planned to implement something like this). I think we should propose a default value and allow the user the change this value.
I'm however wondering if we should also try to isolate big storepaths in dedicated layers (> 50Mo), in order to reduce the size of the last layer (the one containing all storepaths that are not isolated).
But, another tradeoff we have to find: having heuristics the user can understand!
I believe the following comment might be relevant to this discussion: https://github.com/NixOS/nixpkgs/pull/122608#issuecomment-848629415
I'm interested in this discussion as well, as I'm looking at images which contain 800-1000 store paths. In my case, I have hourly flake tags stretching back months that can be very easily used to understand which paths are the most stable, and which ones tend to mutate together as groups. It would be amazing if there was some way I could prep that data into a "guidebook" that nix2container could consume in order to help it make really intelligent decisions about layer grouping.
I think it's time to finally modify the OCI standard to accommodate for (unordered) set-type "layers" instead of (ordered) list-type layers.
The precondition for loosing order is guaranteed non conflicting paths and the consequence of loosing order is to obsolete the file system overlays and hence topple the layer limit.
The layer limit, as a deliberate performance restriction to the fs overlay assembly, had imposed a highly denormalized storage layout, that we are all suffering from with our nicely normalized Nix closures.
Non-conflicting paths don't suffer from the same performance penalty at the moment of reassembly.
I've noticed, that in many scenarios, the pigeon holing suffers form a off-by-x problematic which effectively invalidates all subsequent layers.
What we've started exploring within Standard is to segregate lower volatile dependencies from the higher volatile package and the even higher volatile operable.
@blaggacao Is there any work being done on proposals for that? It has obvious appeal to the Nix crowd, but it's hard to imagine either the standard or implementations making it much of anywhere unless there are broader use-cases.
Another framing could be to ask, are there concrete proposals for alternative implementation approaches that don't rely on filesystem level overlays? It's not clear to me what those could be other than a custom FUSE driver or assembling Nix-style symlink/hardlink farms— and both of those are of course already possible today. That they haven't been done (for example, to raise the layer limit without changing the semantics) suggests to me that for 99% of container users, the 128 layer limit might as well be 640K layers.
If I'm correct that this problem is almost entirely a Nix ecosystem concern, I wonder if it might be worth considering solutions that don't require turning every Nix store path into a container layer— for example, creating some kind of Volume driver/plugin that provides locally cached access to a global Nix store so that you can start with a tiny base container and just install stuff as usual, but those installs all become instantaneous no-ops if the on-node cache is warm.
The current algorithm is based on popularity with a lump-sum cut off. Therefore this popularity is a "global" popularity that optimizes registry-wide storage for the stabily most popular packages.
However, the lump-sum contains the entry points of the directed graph (the least "popular" packages).
These packages are also the most volatile packages over time and within the context of an image name.
Let's devise an pigeon-hole algorithm that optimizes:
While certainly not the best statistical algorithm, it is a pareto-efficient improvement when
sum(layers) > maxLayers
and a no-op otherwise.