radicle-dev / radicle-link

The second iteration of the Radicle code collaboration protocol.
Other
423 stars 39 forks source link

Upgrade link-git to use latest version of git-ref #792

Closed Byron closed 2 years ago

Byron commented 2 years ago

git-ref was was improved such that users aren't forced anymore to handle packed-refs thanks to an auto-updated shared cache.

As it's possible to still access the previous (but now renamed) versions of methods that use packed-refs as parameter. Along with direct access to the latest cached packed-refs (lazy loaded, auto-reload on modification), one can get snapshot-like access by reusing the same packed-refs across multiple calls.

All this is in an effort to get closer to a unified API that can work similarly in a ref-table implementation.

There is also a 'general' store in the works which can be either loose refs or a ref-table, but that one seems only relevant once ref-table is actually implemented.

Signed-off-by: Sebastian Thiel sebastian.thiel@icloud.com

Byron commented 2 years ago

Maybe you can advise on the build failures on windows as these seem unrelated.

Also now linux (which worked previously) fails like macos with an error about the replication user missing.

Please note that locally I see a couple of tests failing which CI on MacOS doesn't reproduce. These failed before my edits as well.

test test::integration::librad::scenario::working_copy::can_fetch ... ok
test test::integration::link_git::protocol::clone_gitoxide ... FAILED
test test::integration::link_git::protocol::clone_libgit ... FAILED
test test::integration::link_git::protocol::empty_fetch - should panic ... ok
test test::integration::link_git::protocol::smoke ... FAILED
test test::integration::librad::smoke::clone::when_connected .

I am saying this in case these are legit and can be fixed, but I wouldn't know where to start. Here is the actual error of 2/3 of them:

thread 'test::integration::link_git::protocol::clone_gitoxide' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: Io(Custom { kind: InvalidData, error: "`fetch` is empty" }) }', test/src/test/integration/link_git/protocol.rs:268:6

The smoke test also doesn't get data apparently but fails due to missing branches.

---- test::integration::link_git::protocol::smoke stdout ----
thread 'test::integration::link_git::protocol::smoke' panicked at 'assertion failed: `(left == right)`
  left: `{}`,
 right: `{"refs/heads/main", "refs/heads/next", "refs/pulls/1/head"}`', test/src/test/integration/link_git/protocol.rs:166:5

Maybe it's all related to the git executable not working as expected.

kim commented 2 years ago

git-ref's code to detect on-disk modification of packed-refs is racy, so I would prefer to not upgrade at this point. Feel free to send the other patch to the ml.

Byron commented 2 years ago

git-ref's code to detect on-disk modification of packed-refs is racy, so I would prefer to not upgrade at this point. Feel free to send the other patch to the ml.

In case it matters, the racy code isn't used at all as the internal cache isn't used in the called methods. These are the originals, but renamed.

Unfortunately I don't understand in which way it's racy, so would be glad if you had a hint for me so it can be fixed. From a locking perspective I don't see why it's racy.

Thanks a lot.

kim commented 2 years ago

You need to take a file lock on packed-refs to guarantee atomicity.

Byron commented 2 years ago

You need to take a file lock on packed-refs to guarantee atomicity.

To my knowledge, new packed-refs files are only ever moved into place (from the .lock file, which is written, over the original file). Moves are considered atomic which is why the file is not half-written when reading it. Is this the raciness you refer to?

If not I can imagine that the file could indeed change between the stat() call and the actual read/mmap, but git doesn't seem to care either because no file lock is acquired for reading (even though they do save the stat() call if they have lock right now).

What do you think?

kim commented 2 years ago

the file could indeed change between the stat() call and the actual read/mmap

Exactly. We don't want to track the timestamp of the old file, or do we?

Either way, I don't think this is a good patch because we're already tracking packed-refs on a different layer. So what we get here is a no-op containing a lot of middle-digit version bumps, for which we would now need to review the changelogs. If there's no actual reason to upgrade, I'd prefer to keep things stable for a while.

Byron commented 2 years ago

Either way, I don't think this is a good patch because we're already tracking packed-refs on a different layer. So what we get here is a no-op containing a lot of middle-digit version bumps, for which we would now need to review the changelogs. If there's no actual reason to upgrade, I'd prefer to keep things stable for a while.

Thanks for the clarification, I will be closing this PR and try the mailing list - it's my first time 😅. I note that there should be no updates in future unless there add something substantial, like ref-table maybe, so changelog reviews can be done more rarely and in bigger chunks.

Exactly. We don't want to track the timestamp of the old file, or do we?

That's interesting. To my mind putting down a lock for reading in this situation will reduce concurrency as it blocks (or cancels) writers, while reader will still have made the correct decision. The file change since it was last loaded, and even if it changes again after deciding it should be reloaded (or is deleted entirely), the result will be 'more recent' without blocking writers which seems desirable. This is also why git doesn't put down a lock before reading a loose ref but relies entirely on atomic file moves. The only way to get consistency and atomicity across entire transactions will be the ref-table implementation, and I am genuinely curious whether it will put down a lock for reading or not - after all it also has multiple files to handle and even if it's append-only, I'd assume at some point it must cleanup.

But I digress, see you on the ml.

kim commented 2 years ago

Say I could be convinced that the timestamp doesn't matter: I notice you're taking an upgradeable read lock to check the mtime.

The calling thread will be blocked until there are no more writers or other upgradable reads which hold the lock.

Why is this ok?

Byron commented 2 years ago

are no more writers or other upgradable reads which hold the lock.

The way I understand this, an upgradable read that holds the lock is a writer, but only if it has been upgraded. This also means that the current implementation most definitely can racily load the packed buffer, which is more wasteful than it could be. Maybe after upgrading the reader to a writer, one should check again if an update is still required.

Is this what you are implying, or is there something else I don't see?

kim commented 2 years ago

I don’t think your understanding is correct.

From https://docs.rs/lock_api/0.4.5/lock_api/trait.RawRwLockUpgrade.html:

There may only be one upgradable lock at any time, otherwise deadlocks could occur when upgrading.

Also: https://docs.rs/parking_lot/0.11.2/src/parking_lot/raw_rwlock.rs.html#568

So, your code behaves as if every access is going through a mutex.

Byron commented 2 years ago

I don’t think your understanding is correct.

I agree 😅. The 'fearless concurrency' part of Rust doesn't necessarily extend into concurrency primitives, which do still require a good amount of testing just to be sure the docs aren't read with Rust-goggles on.

With that in mind I pulled out the docs for the methods that are actually called, read_upgradable()…

Locks this RwLock with upgradable read access, blocking the current thread until it can be acquired. The calling thread will be blocked until there are no more writers or other upgradable reads which hold the lock. There may be other readers currently inside the lock when this method returns. Returns an RAII guard which will release this thread’s shared access once it is dropped.

…and upgrade()…

Atomically upgrades an upgradable read lock lock into a exclusive write lock, blocking the current thread until it can be acquired.

…and none talk about deadlocks. Admittedly, I don't claim to understand the depth of the plumbing documentation and code provided and have no doubt that deadlocks may happen, but they are not mentioned anymore in the porcelain API that parking_lot provides.

Putting it to the test, I have adjusted the object-access experiment to not only use the read() portion of an RwLock, but also try UpgradableRwLock and UpgradableRwLock + Upgrade. The latter does not deadlock even under heaviest contention. What's clearly visible though is that under contention, the upgradable lock is doing the opposite of what I wanted it to do - it's way slower and basically forces everyone to wait even if no write lock is actually obtained. For completeness, there is now a 'read - then write' test (RwLock + Write) to understand which fares better under worst contention.

gitoxide: 1510378 objects (collected in 23.059833ms
parallel gitoxide : confirmed 1510378 objects exists in 15.333041ms (98504792 objects/s)
parallel gitoxide (uncached, warmup): confirmed 18701367848 bytes in 8.356435291s (180744 objects/s)
parallel gitoxide (uncached): confirmed 18701367848 bytes in 8.815946791s (171323 objects/s)
parallel gitoxide (uncached, Arc, RwLock + Write): confirmed 18701367848 bytes in 84.934256s (17783 objects/s)
parallel gitoxide (Arc, RwLock + Write): confirmed 1510378 objects exists in 879.401625ms (1717506 objects/s)
parallel gitoxide (uncached, Arc, UpgradableRwLock): confirmed 18701367848 bytes in 72.129044375s (20940 objects/s)
parallel gitoxide (Arc, UpgradableRwLock): confirmed 1510378 objects exists in 237.18825ms (6367845 objects/s)
parallel gitoxide (uncached, Arc, UpgradableRwLock + Upgrade): confirmed 18701367848 bytes in 72.00787225s (20975 objects/s)
parallel gitoxide (Arc, UpgradableRwLock + Upgrade): confirmed 1510378 objects exists in 225.537166ms (6696803 objects/s)
parallel gitoxide (uncached, Arc): confirmed 18701367848 bytes in 8.336313083s (181181 objects/s)
parallel gitoxide (Arc): confirmed 1510378 objects exists in 18.553833ms (81405176 objects/s)
parallel gitoxide (uncached, Arc, RwLock): confirmed 18701367848 bytes in 8.3058375s (181845 objects/s)
parallel gitoxide (Arc, RwLock): confirmed 1510378 objects exists in 144.762791ms (10433469 objects/s)
parallel libgit2:  confirmed 18701367848 bytes in 64.153872166s (23543 objects/s)
libgit2:  confirmed 18701367848 bytes in 75.599038208s (19979 objects/s)

Takeaways

Actions

Thank you for bearing with me on this one, thanks to you this sync-related performance issue won't get far.

kim commented 2 years ago

Ya, "concurrency without fear" is a pick two statement when it comes to shared data :)

I have punted on this myself, but I think you could do even better:

The assumption is that the majority of reads will find packed-refs unmodified. When it is modified, however, we want to reload them asap in order to avoid an inconsistent view of the refdb. The only way to detect when packed-refs have changed is to stat them, which is a relatively expensive syscall.

So, the way to punt is to leave it to the caller to decide for how long they would be able to tolerate inconsistent reads. That's what git does afaik, where you need to take a snapshot. link-git does the same, where the snapshot does not require any synchronisation at all.

libgit2 checks freshness on every single-ref lookup, and when initialising an iterator. That seems to be the behaviour you want to mimick. For this, however, it is not technically necessary to acquire a lock unless and until the memory buffer has been determined stale.

Iow, performing stat does not require synchronisation, and comparing it to a previously recorded mtime only requires atomics (in lieu of volatile). Otoh, reloading requires synchronisation, because every reader will race to reload, which is probably undesirable. So I think wrapping the buffer in an ArcSwap or crossbeam Atomic, while sequentialising writers on a plain old Mutex should be enough (after winning the lock, the mtime should be read again to see if the reload is still necessary).

Byron commented 2 years ago

So, the way to punt is to leave it to the caller to decide for how long they would be able to tolerate inconsistent reads. That's what git does afaik, where you need to take a snapshot. link-git does the same, where the snapshot does not require any synchronisation at all.

It's a lovely abstraction, and it's something people can emulate by obtaining a cached packed-refs buffer (or open a new one) to hand in as parameter to these *_packed(…, buffer) methods. For a plumbing crate I hope that's sufficient, especially now that the default path handles the buffer automatically to assure it fresh, similar to what libgit2 appears to be doing. During iteration, gitoxide only obtains the buffer once though, using it as snapshot effectively.

This is a bit of a new direction for gitoxide's plumbing crates, as they will now try to provide a more comfortable API on top of a more bare one for power users, which ultimately helps me to keep code where it belongs instead of having to do a lot of work in git-repository.

[..] For this, however, it is not technically necessary to acquire a lock unless and until the memory buffer has been determined stale. Iow, performing stat does not require synchronisation, and comparing it to a previously recorded mtime only requires atomics (in lieu of volatile). Otoh, reloading requires synchronisation, because every reader will race to reload, which is probably undesirable. So I think wrapping the buffer in an ArcSwap or crossbeam Atomic, while sequentialising writers on a plain old Mutex should be enough [..]

I agree, and believe that there is no way around ArcSwap/Atomic (or basically the techniques seen in link-gits ODB implementation) for the best possible performance. Having tested the read() lock performance under high load with the object-access experiment I believe that it's fast enough for packed-refs. It's not much slower than a plain Arc apparently which is good enough for the kind of traffic I expect git-ref to see, keeping the code simpler.

[..] (after winning the lock, the mtime should be read again to see if the reload is still necessary).

Yes, agreed, I will put in the additional stat() check in as the current implementation would do extra work which would definitely cost more in highly concurrent scenarios.