zhaofengli / attic

Multi-tenant Nix Binary Cache
https://docs.attic.rs
Other
938 stars 70 forks source link

Storage prefix directories #62

Open nh2 opened 1 year ago

nh2 commented 1 year ago

I did a benchmark of how well Attic deduplicates chromium derivations, and how it compares to deduplicating backup tools such as bup and bupstash: https://github.com/NixOS/nixpkgs/issues/89380#issuecomment-1605028571

One thing I noticed is that in ~/.local/share/attic/storage, all CDC chunks are in a single folder.

This will not work well (be slow) on many Linux file systems, and not work at all on others.

For example:

My benchmark of a couple Chromium derivations already created 100k files.

So it might already break ext4.


This is why most deduplicating backup tools like bup, kopia, and soon bupstash, use prefixe directories, e.g.

928/
  928fe29d-f7c6-4bdf-98ae-6185c3efd604.chunk

I recommend that Attic does the same.


How long your prefix dir should be depends on how many files you expect to store (which in turn depends on how large your chunks are and how large the deduplicated content is), see more in the next posts.

nh2 commented 1 year ago

Copying here what I posted to the bupstash Matrix channel:

I am very convinced that you need prefix directories for the files in the bupstash repo.

Here are my latest learnings:

  • ext4 will just return ENOSPC (no space left on device) by default, even at "small" directory sizes (e.g. 8M files on one of my servers). This is because it uses 32-bit hashes for its dir_index lookup feature (a default option). A hash collision when writing a new file will immediately give ENOSPC. This can be addressed with the large_dir option (not default), but that was buggy and led to corruption until recently (http://www.voxelsoft.com/2021/ext4_large_dir_corruption.html, great page btw, has some gems). Linux fixed it, but introduced another corruption that I told the maker of this article, so it is now also mentioned on this article (after "kernel developers apply naïvefix anyway"). This means a bupstash repo on ext4 will just fail.
  • Ceph has dirfrags intended to support large dirs, but still suffers from various issues: Apparent locking of writes vs reads, making reads stall.
  • ZFS: Seems to work OK.
  • XFS: Not tried.
  • All flat large directories on Linux make parallel file enumeration impossible, as getdents() is inherently sequential. This means rsyncing the repo to somewhere else cannot be parallelised with flat dirs.

For my CephFS storage with the 200M write-once files, I moved off a flat directory to a prefixed approach with 3 letters of base57 chars, thus 180k prefix dirs. Performance improved a lot due to the avoidance of above-mentioned lock contention.

Kopia uses a/abc/abce123... for its repo. I think this 1+3 approach (or 1+N in general) is even better because it further removes lock contention on the suffix dirs at very low cost (the 1-dirs are always cached in memory). Like bupstash, kopia uses base16 chars (hex). But Kopia uses packfiles and thus has ~10x less files stored. So I think that 1+3 is too small for bupstash's many files, and 1+4 or 1+5 is better.

attic makes even more, smaller files than bupstash (attic's files in 0.1.0 default configuration are 256 KiB).

So I think you'd need a hexadecimal prefix of at least total length 5 or 6.


If 15 GB of chromium creates 100k files, 500 TB of cache.nixos.org will create at least 3.3 billion files.

That's roughtly 2^32, so it could be storead as 16^4 = 64 Ki dirs each containing 64 Ki files.

But you probably want to design for more than the current cache size.

So probably a 2+3 prefix would be good.


There is another consideration:

When the directory structure is stored on a spinning disk, every directory read likely takes a seek. A seek takes 10 ms.

If you have an operation (e.g. a "maintenance" or GC or similar) that requires discovering all existing files in storage/, then a 2+3 prefix structure would take (16 ** 2) * (16 ** 3) / 100 / 3600 = 3 hours to do so.

Thus, I recommend:

ajs124 commented 1 year ago

Is this a duplicate (albeit with a lot of very interesting details) of https://github.com/zhaofengli/attic/issues/45?

nh2 commented 1 year ago

Is this a duplicate (albeit with a lot of very interesting details) of #45?

Oh, yes. I hadn't spotted it.

Mic92 commented 10 months ago

I wrote but didn't test: https://github.com/zhaofengli/attic/pull/98