Prune with depth != -1 can delete reachable commits

alexlarsson commented 3 years ago

Consider a repo with 4 commits (1-4) and 2 branches a pointing at 1 and b pointing at 2. Like so:

 a -> 1 -> 2 -> 3 -> 4
       b --^

When pruning to depth 2, ostree_repo_pull() first traverses from a == 1 with depth 2, so it adds 1-3 to reachable. Then it traverses from b with depth 2, and ideally it should add 2-4 to reachable.

However, what really happens is that ostree_repo_traverse_commit_union_with_parents() terminates the loop in the first iteration (at b == 2) because 2 is already in inout_reachable, so 4 is never reached to be added to the reachable set.

If maxdepth is not set then we're ok because if some commit is in reachable, then everything reachable from it is already in the set.

So, either we need to track the depth we reached from each commit in the reachable set, or we don't short circuit the iteration when depth != -1.

cgwalters commented 3 years ago

Good catch. Did you find this via code inspection or did you actually hit it?

Avoiding short-circuit seems simplest.

alexlarsson commented 3 years ago

I was re-implementing prune in flatpak for performance reasons, and saw it while reading the code.

alexlarsson commented 3 years ago

Context: Enumerating all reachable objects in the flathub repo took 22h. I made a version of prune where it stored all the reachable objects from each commit in a .commitmeta2 file (like the other commit meta, but clients don't download it). This took the enumeration down to 39 minutes.

cgwalters commented 3 years ago

There are definitely a bunch of optimizations we could make for prune. The problem domain is quite analogous to language runtime garbage collection.

Caching reachable objects makes sense. Sharding seems doable too (e.g. have multiple repos split by commit hash prefix or ref name). Another technique could be to each time we find an object is reachable, we bump its mtime or so. Then we can avoid traversing e.g. dirtree objects that have been bumped recently in a "fast prune" operation.

wmanley commented 3 years ago

@alexlarsson: How many objects are there in the flathub repo? I assume it uses "archive" mode?

alexlarsson commented 3 years ago

Here is the output running on a clone of the main repo:

Pruning old commits
Finding reachable objects (depth=10)
Elapsed time: 80576.1 sec
Listing all objects
Elapsed time: 514.5 sec
Computing unreachable objects
Elapsed time: 3902.6 sec
Total objects: 16866210
Deleted 5208581 objects, 1.2 TB freed

Note: It didn't actually delete the objects it was just a dry-run.

alexlarsson commented 3 years ago

And yeah, its in archive mode.

wmanley commented 3 years ago

Enumerating all reachable objects in the flathub repo took 22h. I made a version of prune where it stored all the reachable objects from each commit in a .commitmeta2 file (like the other commit meta, but clients don't download it). This took the enumeration down to 39 minutes.

FWIW I've been thinking for a little while of an additional format for storing some ostree objects. If you had a single file that contained multiple dirmeta and dirtree objects you could probably make a lot of operations - particularly traversal operations - a lot faster.

You'd have a file containing a GVariant like:

(
    a(uuuu) // dirmeta SHAs (sorted)
    a(uuuu) // tree SHAs (sorted)
    a(uuuu) // commit SHAs (sorted)
    a(uuua(ayay)) // dirmetas
    a(a(say)a(sayay)) // trees
    a(a{sv}aya(say)sstayay)  // Commits
)

To lookup a given dirmeta you'd do a binary search (or interpolation search) in the dirmeta SHAs and use the offset in that array to index into the dirmetas array. Similarly for tree and commit SHAs.

Notes:

This format is intended to be mmap friendly. You'd mmap the whole file and do all querying with no syscalls and no copying.
The SHAs are stored separately from the data for cache-efficient lookup.
I'm storing SHAs as (uuuu) rather than ay or s as it's a fixed-size GVariant type, which has some efficiency and simplicity advantages. In C it would be represented as a uint64_t[4][]. These advantages would be particularly relevant for performing lookup.
You could apply rules like: "these files must be complete - that is if a file includes a commit it must also contain all the trees that the commit points to." and "no dangling objects". These rules may then be exploited later for faster GC
Combining files together should be quick and could be done in a streaming fashion without reading the whole file into memory first by using merge sort.
Because you were storing the whole objects in this file you wouldn't also need to store them in the filesystem. You'd probably end up with space savings as a result.
I also wonder whether it would be a good idea to include file headers here as well. For ostree-fuse it would be nice to be able to know the size of files for example without doing
I'm not sure how it would work with having multiple of these files. You could imagine for very large repos it might make sense to have a ay at the beginning containing a bloom filter of the SHAs contained with in.

alexlarsson commented 3 years ago

Having some form of mmapable pack-files seems like a possibly good idea. However, in practice for e.g. flathub we rarely do complex operations where this would help. All we do is essentially:

Downloads via http
Commit
Generate deltas
Prune

All of these except the prune are quite local and I'm not sure how much a more complex repo file would help. For the prune all we need is the reachable set from the commits, so the complexity also seems unnecessary.

alexlarsson commented 3 years ago

Also, if you ever want to do this kind of mmaping of GVariant formats efficiently, have a look at https://gitlab.gnome.org/alexl/variant-schema-compiler. We use it in flatpak for looking at ostree gvariant data. See eg: https://github.com/flatpak/flatpak/blob/master/data/flatpak-variants.gv

It is just soo much faster. GVariant recursion is full of unnecessary allocation, refs (with corresponding locking), type parsing, etc.

alexlarsson commented 3 years ago

Because you were storing the whole objects in this file you wouldn't also need to store them in the filesystem. You'd probably end up with space savings as a result.

At the end of the day you need to support http requests of the individual objects, so unless you have some smart web frontend you can't save space like this.

wmanley commented 3 years ago

Also, if you ever want to do this kind of mmaping of GVariant formats efficiently, have a look at https://gitlab.gnome.org/alexl/variant-schema-compiler. We use it in flatpak for looking at ostree gvariant data. See eg: https://github.com/flatpak/flatpak/blob/master/data/flatpak-variants.gv

Thanks. I'd seen this before - really like it. I'd had plans to add support for this to https://github.com/wmanley/gvariant-rs, but https://gitlab.gnome.org/GNOME/glib/-/issues/2121 rather took the wind out of my sails.

cgwalters commented 3 years ago

Not directly related to this but WDYT on moving gvariant-rs into the ostreedev/ org and sharing ownership/maintenance with the new crates team here? One downside of crates.io IMO is how many important crates are under personal namespaces and vulnerable to a person changing job/interest etc. And this is a problem more generally with the rise of Github versus previous foundation-oriented systems like Apache, GNOME, etc.

alexlarsson commented 3 years ago

If you're interested in the impoved performance for prune have a look at: https://github.com/flatpak/flatpak/pull/4231

Some of the ideas from there could probably be picked up in ostree (although the parser generator stuff would add more dependencies). I'm doing this in flatpak because we have a pressing need to get this in so we can actually start pruning the flathub repos before we run out of space...

wmanley commented 3 years ago

Not directly related to this but WDYT on moving gvariant-rs into the ostreedev/ org and sharing ownership/maintenance with the new crates team here? One downside of crates.io IMO is how many important crates are under personal namespaces and vulnerable to a person changing job/interest etc. And this is a problem more generally with the rise of Github versus previous foundation-oriented systems like Apache, GNOME, etc.

That would work for me. Would ostree-oxide be of interest too? It's still very much an experiment but I think there could be some value in it. I'm currently playing around with seeing what a prune implemented in pure rust would look like. I'm using it to learn about rayon and how adding concurrency could speed these things up.

cgwalters commented 3 years ago

That would work for me.

I granted all ostreedev org members the ability to create public repos, so feel free to transfer as you like!

Would ostree-oxide be of interest too?

Might as well add it!

It's still very much an experiment but I think there could be some value in it. I'm currently playing around with seeing what a prune implemented in pure rust would look like. I'm using it to learn about rayon and how adding concurrency could speed these things up.

I've definitely thought about this too although more from the perspective of doing commits from the filesystem; it's an embarrassingly parallel problem that we're doing very embarrassingly serially.

alexlarsson commented 3 years ago

While I agree that much of what ostree does is highly parallel I'm not sure we're actually going to do much better in real time if we exploited that. Take a look at this prune run on the flathub repo:

# time ./flatpak build-update-repo -v --no-update-summary --no-update-appstream --prune-dry-run --prune-depth=10  /tank/repo-prunetest/stable/
Pruning old commits (dry-run)
F: Finding reachable objects, unlocked (depth=10)
F: Elapsed time: 78056.3 sec
F: Finding reachable objects, locked (depth=10)
F: Elapsed time: 1188.7 sec
F: Pruning unreachable objects
F: Elapsed time: 3935.9 sec
Total objects: 16948803
Deleted 5254040 objects, 1.3 TB freed

real    1386m21.345s
user    46m56.503s
sys     28m33.380s

The total user + sys time is 75 minutes, but the real time is 23 hours. I mean, the machine is doing other stuff, but its not that heavily loaded. This is a zfs filesystem so I guess we don't see zfs reading in file metadata in the sys time.... Anyway, basically the entire thing is waiting on i/o, and I doubt if we emitted all the i/o ops in parallel it would do much better...

wmanley commented 3 years ago

@alexlarsson: I've had a crack at parallelising this using rust and rayon: https://github.com/wmanley/ostree-oxide/commit/8a731fa3379c0bbe286799d12ae8130068277362 . In my (limited) testing it's significantly faster than ostree even if I specify RAYON_NUM_THREADS=1.

It seems to give roughly the same results in terms of numbers of objects to delete. I don't have confidence in its correctness right now, but it doesn't actually support deleting the objects yet, so should be safe to run.

You can run it with:

git clone https://github.com/wmanley/ostree-oxide
cd ostree-oxide/ostree-cmd
cargo run --release prune --repo=<path to your repo> --refs-only --depth=10 --no-prune

or if you don't have a rust toolchain on the system with the repo you can copy ostree-oxide/target/release/ostree-cmd over after running cargo build --release. It only dynamically links against libc.

wmanley commented 3 years ago

@alexlarsson: Let me know if you don't have time/inclination to try my parallel implementation, then I can put it out of my mind :)

wmanley commented 3 years ago

Not directly related to this but WDYT on moving gvariant-rs into the ostreedev/ org and sharing ownership/maintenance with the new crates team here?

I've transferred ownership to the ostreedev organisation for both gvariant-rs and ostree-oxide. Hopefully they will be useful.

cgwalters commented 3 years ago

I've transferred ownership to the ostreedev organisation for both gvariant-rs and ostree-oxide. Hopefully they will be useful.

Nice! Want to send a note to https://mail.gnome.org/archives/ostree-list/ at least?

ostreedev / ostree

Prune with depth != -1 can delete reachable commits #2338