rust-lang / git2-rs

libgit2 bindings for Rust
https://docs.rs/git2
Apache License 2.0
1.67k stars 384 forks source link

git2 holds open deleted packfiles, effectively leaking disk space #904

Open roy-work opened 1 year ago

roy-work commented 1 year ago

We have a long-running daemon that does some stuff with a repository.

We noticed that the disk backing the repository was "full":

thing@thing-0:/var/cache/thing$ du -chs .
7.3G    .
7.3G    total
thing@thing-0:/var/cache/thing$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         16G   16G     0 100% /var/cache/thing

That might see illogical, but that's a hint that deleted files are being held open. If we ls our daemon's open files, that confirms it:

0 -> /dev/null
1 -> 'pipe:[16731674]'
10 -> 'socket:[16737720]'
11 -> 'anon_inode:[eventpoll]'
12 -> 'anon_inode:[eventfd]'
13 -> 'anon_inode:[eventpoll]'
14 -> 'anon_inode:[eventfd]'
15 -> 'anon_inode:[eventpoll]'
16 -> 'socket:[16729776]'
17 -> 'anon_inode:[eventpoll]'
18 -> 'anon_inode:[eventfd]'
19 -> 'anon_inode:[eventpoll]'
2 -> 'pipe:[16731675]'
20 -> 'socket:[16729776]'
21 -> '/var/cache/thing/monorepo/.git/objects/pack/pack-a02eda73e77509f55e734b4f80f0e49c60093184.pack (deleted)'
22 -> '/var/cache/thing/environments/.git/objects/pack/pack-be204a28036cbc4cd01c2248a6ee2168a1989a96.pack (deleted)'
23 -> '/var/cache/thing/environments/.git/objects/pack/pack-6e09f3d300e72f084b05ba7bc1ccce40c10f582c.pack (deleted)'
24 -> '/var/cache/thing/monorepo/.git/objects/pack/pack-daec94cd70f17f2b1d3a638ce87cb4c086ce4629.pack (deleted)'
25 -> '/var/cache/thing/environments/.git/objects/pack/pack-2ad6f4562cd68fbf4ecc4d5af17579aa590e7263.pack (deleted)'
26 -> '/var/cache/thing/environments/.git/objects/pack/pack-aed3c72e24e8ec9a5db6b11bc2a81f071b5596f6.pack (deleted)'
27 -> '/var/cache/thing/environments/.git/objects/pack/pack-d4e7ed286a0e2e79cd36688ac33c5f98948e6b6a.pack (deleted)'
28 -> '/var/cache/thing/environments/.git/objects/pack/pack-6ffb344d987435d1cca7934ed89e7ddf9d6d79a2.pack (deleted)'
29 -> '/var/cache/thing/environments/.git/objects/pack/pack-09e7c333cc2b2dbb5b30550298b31f30ea4439c7.pack (deleted)'
3 -> 'anon_inode:[eventpoll]'

So, we can see here that we're holding a bunch of packfiles, which explains why the used space is so much higher than the du total.

We keep the git2::Repository object around: I think I didn't know if it was expensive to create, and we only have two repos, so we just create the git2::Repository objects are startup, and then keep them as global state.

But that seems to cause a "disk" leak, over repeated fetches & repacks.

kim commented 1 year ago

IIRC it’s not unbounded, but libgit2’s limits are a bit optimistic. You can set the MWINDOW_FILE_LIMIT via the raw bindings (libgit2-sys).

roy-work commented 1 year ago

IIRC it’s not unbounded, but libgit2’s limits are a bit optimistic.

It was holding onto 53% of the total size of the volume, at which point the volume ran it out of space.

You can set the MWINDOW_FILE_LIMIT via the raw bindings

The only hit I get for MWINDOW_FILE_LIMIT on Google is a copy of this issue, and libgit2's doc's search doesn't recognize it.

kim commented 1 year ago

https://github.com/libgit2/libgit2/pull/5396

roy-work commented 1 year ago

For my reference, the docs page.

(The commit indicates that the default for that is unlimited.)