Closed geigerzaehler closed 2 years ago
This is a known issue, but is unlikely to be fixed anytime soon. We did initially assume that libgit would just do the right thing when accessing a repository concurrently, but that is not true. We would need to employ ref transactions, which turn out to not actually be very transactional, and possibly a custom per-namespace lock file which is also obeyed by CLI pushes.
I don't think this affects operations, though, because things will converge eventually. Feel free to correct me if I'm wrong.
I don't think this affects operations, though, because things will converge eventually. Feel free to correct me if I'm wrong.
We’re seeing this in CI tests that fail to replicate a project. I’ll take a closer look to see if this is actually the cause.
Isn’t kind of race condition be prevented by only having one fetcher per URN at any given time or am I misunderstanding librad::git::storage::fetcher
?
That was the intention, yes, but doesn’t seem to be sufficient. I suppose it could happen that a test is updating the repo while it is being fetched into. Which is not covered by this fetchers mutex.
In theory, ref transactions would prevent this, but unfortunately not in the way they are currently implemented in gitoxide (or git.git): what we would need is to acquire a lock (-file) before computing parent. libgit2 allows to do that, but gitoxide employs a CAS strategy which will require (potentially expensive) retries. /cc @Byron
I’m not sure yet, maybe the best approach is to just lock the entire namespace for writes via the filesystem.
not in the way they are currently implemented in gitoxide
Maybe I’m wrong about this actually. We can prepare a transaction, knowing the previous tip, which will acquire the lock file until either dropped or committed.
Thanks for looping me in, it's amazing that races around reference updates are manifesting in tests - it's a helpful problem to have here (rather than in production).
We can prepare a transaction, knowing the previous tip, which will acquire the lock file until either dropped or committed.
The current implementation of Transaction::prepare(…)
needs the reference edits to be spelled out already while time passes between reading the parent
and producing the new commit which is then the new target for the reference in the desired update. This is the time when the tip of the branch can be moved.
One could indeed use a 'bogus' transaction to hold a lock while one is reading the tip, and when the new commit is ready, that transaction could 'quickly' be dropped to make space for a new one. That way one would probably make it much less likely for the race condition to show, even though it's present between dropping the bogus transaction and preparing the real one.
Currently in gitoxide
…
let mut transaction = repo.refs.transaction().prepare(RefEdit { name: branch, new_target: None, … })?;
let parent = branch.peel_to_commit()?
let commit = Commit::new(parent, …)?;
drop(transaction);
// here is where there is still a race
repo.refs.transaction().prepare(RefEdit { name: branch, new_target: Some(commit), … })?.commit()?;
But what if one could 'update' the existing 'bogus' transaction with new RefEdit
s without dropping the locks? Here is an example.
let mut transaction = repo.refs.transaction().prepare(RefEdit { name: branch, new_target: None, … })?;
let parent = branch.peel_to_commit()?
let commit = Commit::new(parent, …)?;
// This keeps the existing locks of the union of the existing ref-edits and the new ones, while using the new ref-edits
transaction.replace(RefEdit { name: branch, new_target: Some(commit), … })?;
transaction.commit()?;
I could imagine a call-back based API for the same thing, but would prefer the replace()
API for its ease of use on the callers side.
What do you think?
Thanks @Byron
The ability to replace an edit would indeed be helpful. Potentially also to remove it from the tx, so as to be able to reject non-fast-forwards without failing the entire tx, but that's another scenario.
The thing is that the update as linked to above is indeed atomic: because the
parent is no longer the one we've seen, the implicit ref update fails. Now the
question is what to do: the caller could just retry. But that requires to read
the parent repeatedly. It could also retry just acquiring the lock (as git-ref
resp. git-lock
offers), which is potentially cheaper. Obviously we don't
want to retry indefinitely -- but that's okay, because in this case we would
converge to the desired state by just calling update
repeatedly. We would just
need to make sure we don't drop the last update
, ie. as long as there is one
waiter, new waiters could just give up immediately.
I guess... concurrency is hard?
The ability to replace an edit would indeed be helpful. Potentially also to remove it from the tx, so as to be able to reject non-fast-forwards without failing the entire tx, but that's another scenario.
Right now I thought it could be a union of ref-edits, so leaving some paths out would remove them, while also allowing additions and replacements. But it's a matter finding an API, so I will keep removals in mind no matter how it will eventually look like.
The thing is that the update as linked to above is indeed atomic: because the parent is no longer the one we've seen, the implicit ref update fails. Now the question is what to do: the caller could just retry. But that requires to read the parent repeatedly. It could also retry just acquiring the lock (as
git-ref
resp.git-lock
offers), which is potentially cheaper. Obviously we don't want to retry indefinitely -- but that's okay, because in this case we would converge to the desired state by just callingupdate
repeatedly. We would just need to make sure we don't drop the lastupdate
, ie. as long as there is one waiter, new waiters could just give up immediately.
Speaking only for what would be happening with git-ref
, with the API illustrated above, prepare()
actually takes a parameter to control what to do if the lock can't be acquired. Either it fails immediately or waits with timeout (implemented with exponential backoff and sleep). In 100ms it gets to retry about 10 times with that.
Once the lock is present, the transaction will even have read the current target of the branch to change as that's needed to support hooks one dayz, which could be made accessible between prepare()
and commit()
:
let mut transaction = repo.refs.transaction().prepare(RefEdit { name: branch, new_target: None, … })?;
// no need to read the branch again…
// let parent = branch.peel_to_commit()?
// the transaction did that for us.
let parent = transaction.iter_edits()?.find(|e| e.name == "branch to update").and_then(|e| e.edit.previous_value.target().as_id()).unwrap();
let commit = Commit::new(parent, …)?;
// This keeps the existing locks of the union of the existing ref-edits and the new ones, while using the new ref-edits
transaction.replace(RefEdit { name: branch, new_target: Some(commit), … })?;
transaction.commit()?;
I guess... concurrency is hard?
Just because it's not all Rust yet ;). 🎯
let parent = transaction.iter_edits()?.find(|e| e.name == "branch to update").and_then(|e| e.edit.previous_value.target().as_id()).unwrap();
Wait, that is surprising: are you saying that preparing the transaction would modify any previous_value
the caller gave as Create::OrUpdate
? Or would the iter_edits
iterator yield some type != RefEdit
?
Wait, that is surprising: are you saying that preparing the transaction would modify any previous_value the caller gave as Create::OrUpdate?
Great catch! If there was no previous value, the previous value will indeed be read and put in there. Otherwise the given previous value will be compared to the actual previous value. Storing the actual value in as previous value has the advantage that it can be inspected after the transaction was prepared (which would be a future API addition) and once the commit()
was performed - it returns the possibly adjusted RefEdits
. Note that it also splits ref-edits when symbolic refs are involved, so one might get more edits back then one put in even - something the code above also blissfully ignores 😅.
If there was no previous value, the previous value will indeed be read and put in there. Otherwise the given previous value will be compared to the actual previous value.
That seems quite dubious. If I specify Create::OrUpdate { previous = Some(oid) }
,
then I expect that the transaction fails if previous
is not what I said.
That's what this API suggests at least. Now if you just update this value, I
would need to inspect the transaction state before committing in order to see if
something is different from what I saw (and I might need to run a reachability
check again).
Perhaps prepare
needs to return an error already if certain known invariants
are not met? Or some kind of Transaction::push
API would work better, like in
libgit2?
That seems quite dubious. If I specify
Create::OrUpdate { previous = Some(oid) }
, then I expect that the transaction fails ifprevious
is not what I said.
That's exactly what happens, that's exactly what prepare()?
does.
Now if you just update this value, I would need to inspect the transaction state before committing in order to see if something is different from what I saw (and I might need to run a reachability check again).
Probably a misunderstanding - after prepare()
locks are in place and well-behaved process would be changing these branches.
Perhaps
prepare
needs to return an error already if certain known invariants are not met? Or some kind ofTransaction::push
API would work better, like in libgit2?
Yes, exactly like that it happens.
Maybe more helpful than pseudo code is to write it out.
prepare(branch)?
a transaction, but don't care about the previous value which means prepare()
allows any value there. As new value I would probably set a null sha1 as I don't know it yet.branch
which now holds its previous value if it existed. This one serves as parent. It's an initial commit if there is no parent and the previous value is still unset.branch
with a new one, while keeping the lock with that new api, called replace(…)
in the example above.commit()
everything should go through and locks are releasedI will try to dog-food that API when implementing something like fetch
myself soon to work out the kinks.
Yeah, so Create::OrUpdate { previous = Some(oid) }
can never be updated with a
different oid, because that would violate the constraint the caller imposes.
This would also be the case if the Create
spec does not violate any
constraint: if I edit the RefEdit
after the fact to now contain
Create::OrUpdate { previous = Some(made_up_oid) }
, the transaction must fail
because clearly made_up_oid
is not the previous value.
That's why I said perhaps the iterator should yield items of a different type, which simply makes these invalid states unrepresentable. The only things which are actually editable are reflog entries and the target.
That's why I said perhaps the iterator should yield items of a different type, which simply makes these invalid states unrepresentable. The only things which are actually editable are reflog entries and the target.
Now I understand what you mean. Depending on how it's implemented, it might even bark at you if the previous value is changed to something that it's currently not. Doing so would be very strange and my intuition is that it would be wasteful to even check for the previous value again as a lock was held since prepare so it won't have changed, so a changed previous value would have no effect. Introducing a new type to make such a thing impossible seems like a lot for handling such a no-op case. Maybe there are others that I am not seeing though. More important to me is to handle the non-trivial requirement that the implementation must do the same for any set of ref-edits no matter if they where coming from an initial prepare()
or any amount of replace(…)
calls. Let's see, maybe my hesitancy to open this can of worms will lead to an interface which limits the actions to a smaller subset that is typically used. In any case, it's a feature worth having and it must be done properly. I think we want the same :).
When thinking about this issue here in particular, and assuming the commit(…)
part is done as proposed, the other part which tries to edit refs as part of a fetch may read a 'previous' value that isn't present anymore when prepare()
is called. Then it will have to implement retry there and there is no way around it that I would see. On the bright side, the commit()
side of the affair would always work as it only reads the commit for parent
after a lock was acquired.
One more thing came to mind: Above I was exploring how fixing this would force both participants of the race to be aware and use future capabilities in gitoxide
- all of which seems expensive and unfeasible in practice just yet. Even though this fine-grained locking seems to have advantages, in practice there might be no benefit depending on the branch access patterns in radicle-link
.
The proposition I have heard multiple times now by @kim was to introduce a namespace-wide lock to serialize operations. git-lock
(stability tier 2) should be very usable for this especially with the option to wait for a duration when trying to acquire the lock. This comes with only minimal changes to both paraticipants which now try to acquire a lock along with some thought around application signal handling.
This should be a cheap solution for this class of issues without having to implement retries thanks to the configurable waiting time.
Isn’t kind of race condition be prevented by only having one fetcher per URN at any given time or am I misunderstanding
librad::git::storage::fetcher
?That was the intention, yes, but doesn’t seem to be sufficient. I suppose it could happen that a test is updating the repo while it is being fetched into. Which is not covered by this fetchers mutex.
I added some log statements and it looks to me like multiple fetchers for the same same namespace are created before any of them are dropped. Could this line be the culprit? https://github.com/radicle-dev/radicle-link/blob/e6f73616c2e1f6fdb49cd5ff329909615d878739/librad/src/git/storage/pool.rs#L167
We clone the fetchers which means we have a different value per storage in the storage pool. Consequently the different storage instances can do concurrent fetches.
Yeah good catch: DashMap
is Clone
, but it should really be Fetchers(Arc<DashMap>)
which gets cloned
[ x-post to ~radicle-link/dev@lists.sr.ht for information retrieval reasons. Please continue the discussion there if desired (no subscription required), as this item will be closed at GH's discretion^H automation. ]
Okay, I think I figured out the missing piece for guaranteeing multi-process transaction safety without having to resort to global locks. That is, pre-reftable 0. You can skip to the tl;dr section if you're not interested in the hows and whys.
== What's ref transactions, anyway
Git fundamentally assumes multi-process access to a repository: it is not required to run a coordinator process to modify a repository. In order to guarantee atomicity of ref updates, there is thus no way around the 3-step dance:
What the .lock file is depends on the backend. Pre-reftable, it is always the
loose ref (eg. refs/heads/master.lock
), even when packed refs are in effect. A
"transaction" is simply acquiring multiple of those .lock files, and performing
step 3 as a batch operation.
This means that:
a. ref transactions are not at all atomic b. updating N refs sequentially has the same cost as in a transaction c. the cost of file locks is due for each and every ref update
It is commonly assumed that the commit phase (rename .lock files) is in practice atomic, because reasons it can fail are in the realm of hardware failure or power loss. Which obviously do not happen in real life, and if they do, you got bigger problems anyway (tm).
== Implementation differences
libgit2
, git.git
, and git-ref
differ slightly in their implementation
strategies:
libgit2
Each to-be-updated ref must be locked explicitly before setting its target and reflog. That is, the lock is held until the transaction either commits or is aborted. The target and reflog can be changed while holding the lock, or the ref can be removed from the transaction (ie. its lock can be released).
git.git
Each to-be-updated ref is added to a list, along with its expected previous
value. No locks are acquired until the transaction is prepared. After
preparing, the transaction can no longer be modified. The transaction aborts
if the value of a ref on disk differs from the expected value. Committing a
transaction is essentially the rename step. The UI for this is git update-ref
.
git-ref
Follows git.git
's semantics: updates are described with a datatype, an
iterator of which is then prepared in batch (ie. locks are acquired etc). An
explicit commit step is required, even though it does not serve any apparent
purpose (git.git
executes a hook after preparing, which can abort the
entire transaction. This is obviously not an option for a library).
It is easy to see that libgit2
's API is more flexible, and very easy to
understand because it surfaces exactly what is going on. On the downside, it
bakes in row-level locking semantics, which reftables do not support 0. So if
reftable ever becomes a thing for libgit2
, the API either needs to change, or
will lie about its semantics.
It is somewhat less easy to see that git.git
's semantics can be rather
wasteful: if a policy is in place which prevents non-fast-forwards, a
reachability check must be performed for all refs in question which are part of
a transaction. This is a relatively expensive operation. Now, because this check
must be performed on unlocked refs, it wasted resources if a competing
transaction commits first (and thus the "CAS" fails).
== So why dooes git.git
get away with this?
The motivating use case for ref transactions is git push --atomic
: instead of
rejecting only the non-ff updates, the push shall fail as a whole. Now, when a
non-ff is encountered in a push (atomic or not), it means that it is on the
client to retrieve the remote updates, resolve any conflicts, and try again. The
new head is most likely a fresh SHA, so we need to perform all computations
again either way. Therefore, it is pointless to block the client until a
competing transaction commits, as the waiting transaction will anyway fail whp.
== Okay, so what does this mean for radicle-link
?
Instead of contending writes on a central server, link
fetches from other
clients, where the sets of ref updates may or may not be overlapping. In theory,
it would be fine to fetch into the same namespace (URN) from multiple peers
concurrently, which could be beneficial especially for seed nodes because the
probability of overlapping ref sets is lower there.
In order to authenticate a set of repository mutations, we fetch the
refs/rad/*
hierarchy first, and must prevent non-fast-forwards to the extent
possible. A non-ff under this hierarchy is basically fatal: someone (either the
peer or the origin) rewrote history, which renders this identity tainted (again,
either the peer's or the origin's).
So, we can perform the ref update for this hierarchy in a transaction, and fail the entire fetch if it yields a non-ff. Otherwise, if the transaction fails due to a competing tx, we can retry a number of times, considering that there are no guarantees about the uptime of the remote end.
Note that this step could benefit from libgit2
-style transactions: since there
is no interactive conflict resolution (it's an error), it would be more
efficient to retry only the lock acquisition, and compute reachability while
holding it. It practice, it may also be a negligible optimisation, because we
expect updates in this hierarchy to be near the tip (ie. reachability is not
that expensive to compute).
So anyways, here's the critical part:
Say thread A
updated refs/rad/signed_refs
at commit a
, and went on to
fetch the corresponding packfile. Meanwhile, thread B
updated the signed refs
to commit b
. If B
commits the updates to refs/heads
et al before A
, A
will revert the state to that reflected by a
.
To solve this, either:
A
and B
need to be strictly sequential (which is what we do now)
the refs updates per remote-tracking branch must be strictly sequential (which would increase memory pressure, or require some kind of directory-level lock)
or, the signed_refs
corresponding to the remote tracking branch
under mutation must be included in the transaction as a no-op,
causing the transaction to fail if signed_refs
is no longer at the
commit seen previously
== Mkay tl;dr?
We can increase concurrency of fetches, while also minimising lock contention by
a. scoping ref transactions by remote (where no remote counts as a
remote), and
b. including a no-op signed_refs
update in the transaction which
updates the regular branch heads
This does increase implementation complexity, though, which is a pity, because with reftable we will be back to a global lock (even worse, the lock will be repository-wide).
Note that this does not change the UX of the status-quo.
Thanks for the writeup, I found the comparison between the three implementations very interesting and for the first time see how git2
provides its transaction API.
An explicit commit step is required, even though it does not serve any apparent purpose
Indeed, it's modelled after the git.git
knowing that one day hooks have to be implemented on another layer. With the planned changes one will be able to perform calculations using the locked targets of the refs involved in the transactions and change them after locking. This makes it like a blend between git2
and git.git
, but won't expose locking which should make it seem more natural in conjunction with reftable
.
The latter will be at a disadvantage as computations-without-retries have to be made while the single reftable lock is held, blocking any other writers.
This makes me think that it might be preferable to perform computations on ref targets without holding the lock and retry if the transaction cannot be prepared due to contention, rather than locking first to make computations with guaranteed-to-be-stable refs. Such code would perform similarly in pre-reftable and ref-table worlds.
This does increase implementation complexity, though, which is a pity, because with reftable we will be back to a global lock (even worse, the lock will be repository-wide).
I don't know how locks are currently handled, but if there was a section c) in the list of suggestion related to locks, would these be acquired before making computations, or after making computations? Pre-reftable it seems most efficient to do them after acquiring locks, while with reftable
these should probably be done beforehand to hold the global lock for a shorter time period. Both implementations have to retry if the transaction can't prepare()
but do so at very different cost.
When imagining code that has to transition between such implementations from pre-reftable to reftable, I'd think using libgit2
API would be most convenient.
Sebastian Thiel writes:
An explicit commit step is required, even though it does not serve any apparent purpose
Indeed, it's modelled after the git.git knowing that one day hooks have to be implemented on another layer.
Okay. I find it a somewhat odd choice regardless. If you look at the --stdin
mode of git-update-ref
, it's really not clear why there is a prepare
command
-- you can change your mind and abort
instead of commit
, but I'd be curious
what use cases motivated that.
With the [1]planned changes one will be able to perform calculations using the locked targets of the refs involved in the transactions and change them after locking. This makes it like a blend between git2 and git.git, but won't expose locking which should make it seem more natural in conjunction with reftable.
In that case it would make sense to have the prepare step, indeed. I don't really think (anymore) we need post-hoc mutability, though, it seems to be useful only for corner-case optimisations, and possibly even harmful if used naively.
The latter will be at a disadvantage as computations-without-retries have to be made while the single reftable lock is held, blocking any other writers.
I don't think it makes much difference for "normal" git repositories: even typical monorepos have only one mainline branch, for which all writers compete. It is problematic for namespaced repos, because transactions in different namespaces are guaranteed to not conflict in practice.
I haven't yet looked the details, but I think what will eventually land in
git.git
will amount to per-namespace locking, where the whole-repo reftable
stack will be composed of smaller stacks 0.
I don't know how locks are currently handled, but if there was a section c) in the list of suggestion related to locks, would these be acquired before making computations, or after making computations? Pre-reftable it seems most efficient to do them after acquiring locks, while with reftable these should probably be done beforehand to hold the global lock for a shorter time period. Both implementations have to retry if the transaction can't prepare() but do so at very different cost.
I do not think it would make much of a difference in practice. While increasing fetch concurrency is great, we would still want to limit the total number of concurrent fetches. My proposal amounts to splitting the update-tips phase of a fetch into several transactions, which might need some more elaboration:
Suppose we have project P
, which we store locally at refs/namespaces/P
. We
know public keys A
, B
and C
, from which we wish to receive updates. We
store their ref trees at refs/namespaces/P/refs/remotes/{A,B,C}
(nb. each of
these remote tracking branches can logically only have a single writer). Now, we
may not actually be connected to A
, B
, or C
, but to D
and E
, each of
which store the remote tracking branches we're interested in. D
and E
may or
may not be consistent with each other, with A
, B
, C
, or with what we have
locally.
Say we impose a global concurrency limit of 2, so while we are currently
connected to D
and E
, we may not actually fetch P
from both of them, but
Q
from D
and P
from E
. For the sake of the argument, say we do fetch P
from both of them, negotiated and received packfiles, and now want to start
updating the tips. If we are splitting into transactions A
, B
, and C
, we
have already reduced the chances for contention. We could further reduce it by
randomising the order (should be easy enough if we're using a HashMap somewhere
:)).
Of course, we could reduce it even further by locking only the ref with the
longest worst case ancestry walk
(refs/namespaces/P/refs/remotes/{..}/rad/signed_refs
), but that would not
translate to reftable.
(Also note that no ancestry walks are necessary for refs outside of the rad/
hierarchy, in case that was unclear).
References
In that case it would make sense to have the prepare step, indeed. I don't really think (anymore) we need post-hoc mutability, though, it seems to be useful only for corner-case optimisations, and possibly even harmful if used naively.
I tend to agree, especially if I imagine how this would drive up the complexity of the API surface and the implementation, unless it's completely changed into something more akin to git2
.
This means that implementations that don't want to fail have to retry and redo computations in that case.
I haven't yet looked the details, but I think what will eventually land in
git.git
will amount to per-namespace locking, where the whole-repo reftable stack will be composed of smaller stacks [0].
reftable
keeps giving! It really seems to be made for the server.
And even though the beauty above was created from the example, I am afraid I didn't follow how exactly all this works 😅. But what I do get is that there is a bunch of operations the try to update tips of remotes via various paths in the remote graph, and the trick is to globally queue it and schedule it in a way that reduces the risk of conflicts.
There I wonder if this is already the more elaborate solution which would still need to be based on a retry mechanism to avoid permanent failures for the client because of server concurrency. Right now it sounds that a scheduling scheme would only reduce the risk of contention, but not make it impossible. Lastly, if there would be such a global scheduler, wouldn't it know enough to schedule fetches in a contention-free manner?
reftable
would only be a truly adequate replacement if it supports more fine-grained locking via stacks.upload-pack
task will consume memory, quite a lot even if big repositories are cloned.
We sometimes see the following error
In the 25ms before this error we are seeing the following log statement three times
This seems to indicate that there is a race when updating the singed refs concurrently.
There seems to originate from this call. https://github.com/radicle-dev/radicle-link/blob/fcc3a7c11934fc49a433390a9b52e02fa59c4307/librad/src/git/refs.rs#L366-L373
After doing some investigation I found that the error occurs if
parent
is not the tip ofbranch
(../rad/signed_refs
in this case).parent
is obtained a couple of lines before the call above so there is time forbranch
to get updated in the meantime.