madelson / DistributedLock

A .NET library for distributed synchronization
MIT License
1.86k stars 193 forks source link

Consider adding acquiring/releasing multiple SQL Server locks in batch #50

Open madelson opened 4 years ago

madelson commented 4 years ago

Requested by @jsbattig (see https://github.com/madelson/DistributedLock/pull/49).

Desired functionality:

Design option 1: static utility method:

public class SqlDistributedLock
{
    public static SqlDistributedLockHandle AcquireMultiple(IEnumerable<SqlDistributedLock>, TimeSpan, CancellationToken);
}

Note that this would only be able to batch the acquisition of locks that had the same approach for connecting. That should be fine, though.

Design option 2: new lock class

public class SqlBatchDistributedLock : IDistributedLock
{
     public SqlBatchDistributedLock(IEnumerable<string> names, ...);
}

Other questions:

jsbattig commented 4 years ago

I think it may be nice to add support for a feature when using Try semantics to acquire all locks possible and rather than returning back a bool return back the locks that could not be acquired in the shape of a collection that could be used to re attempt to acquire the locks later on a retry cycle (kind of how we are doing now on a 1:1 basis).

On the other hand, I'm a bit thorn how to handle timeouts. The issue with even having a timeout when using Try semantics is that if we go with the approach to attempting to acquire all we can, the time it would take in some cases would invalidate the idea of acquiring the locks in batch as quick as possible.

I think in the context of batch acquiring of locks we don't allow to pass a timeout parameter to the TryAcquire locks. Either we get each individual lock immediately or we we simply keep going with out list. At the end we return the list of exceptions. In this scenario we could also offer Try semantics with total failure at first failure (with release of acquired locks).

For Regular semantics (not Try) I like your idea of cumulative timeout. The challenge with that is what timeout we use for each attempt? What do we do if an attempt fails? Should we retry on the spot failed attempts until the cumulative timeout time has elapsed or we succeed, in which case we release all locks and return failed? If we go with this progressive approach with attempts in a loop we could set the individual timeouts to a "reasonable" number (if there's such a thing in computing) such as 1/5 of the total timeout. -OR- we make the caller pass a parameter with the individual timeout to use on each attempt with a validation that such number has to be < than total timeout.

And yes, it will fun to contribute with this enhancement but it will take me some time to get into a "clean" state given time constraints. :)

madelson commented 4 years ago

Appreciate your thoughts @jsbattig! You're right about timeouts been a very tricky issue here. On one hand, it makes sense to say "spend up to N ms trying to acquire this list of locks". On the other hand, that statement doesn't dictate how we parcel out our time waiting on each different lock (important because we are not waiting for them all in parallel).

For the non-try case, I think we can resolve this by saying that we have to acquire the locks in the order provided (a necessary requirement to avoid deadlocks if multiple such calls are made), and therefore if the caller says "spend 30s total" we have no choice but to wait up to 30s for the first lock, then look at how much time we have left when that returns and pass that to the next wait.

For the try acquire all case (where it is still all or nothing), I don't see a way to clearly articulate the behavior to the callout without making the API unnecessarily complex (multiple timeouts). Therefore, I think no timeout (0) is the clearest in that case.

The same goes for the operation where we are acquiring all available locks in a list.

Therefore, I here would be my API proposal:

public sealed class SqlDistributedLock
{
     // spends up to timeout (default inf) acquiring all locks in locks in the order provided
    public static SqlDistributedMultipleLockHandle AcquireAll(IEnumerable<SqlDistributedLock> locks, TimeSpan? timeout = null, CancellationToken cancellationToken = default);

    public static ValueTask<SqlDistributedMultipleLockHandle> AcquireAllAsync(IEnumerable<SqlDistributedLock> locks, TimeSpan? timeout = null, CancellationToken cancellationToken = default);

    // tries to acquire all locks in locks in locks in the order provided, timing out immediately if any is unavailable
    public static SqlDistributedMultipleLockHandle? TryAcquireAll(IEnumerable<SqlDistributedLock> locks, CancellationToken cancellationToken = default);

    public static ValueTask<SqlDistributedMultipleLockHandle?> TryAcquireAllAsync(IEnumerable<SqlDistributedLock> locks, CancellationToken cancellationToken = default);

    // tries to acquire each lock in locks in the order provided individually with a timeout of 0. Returns a handle indicating
    // which locks were acquired (could be zero)
    public static SqlDistributedMultipleLockHandle AcquireAllAvailable(IEnumerable<SqlDistributedLock> locks, CancellationToken cancellationToken = default);

    public static ValueTask<SqlDistributedMultipleLockHandle> AcquireAllAvailableAsync(IEnumerable<SqlDistributedLock> locks, CancellationToken cancellationToken = default); 
}

public sealed class SqlDistributedMultipleLockHandle : IDistributedLockHandle
{
    // for AcquireAllAvailable, allows for inspecting which locks were acquired. For any of the above methods,
    // allows for releasing individual locks one by one. Disposing the overall handle will release any remaining
    // locks (see IDistributedLockHandle interface)
    public IReadOnlyDictionary<SqlDistributedLock, SqlDistributedLockHandle> AcquiredLocks { get; }
}

Implementation notes:

Any thoughts/concerns? I'm not sold on the "AcquireAllAvailable" name. I wish it had a "Try" in it but I feel like "TryAcquireAllAvailable" maybe suggests a different behavior? "TryAcquireAny" sounds nice but suggests that we'd short-circuit after finding one that we can acquire.

Since you are interested in contributing this, what I would suggest as a next step is to take a look at the release-2.0 branch and pull together a rough implementation proposal (e. g. which files will change, any open questions about the codebase). We can discuss that here. No need to rush! At some point if I don't hear from you I might take this on myself, but I'm currently working on the 2.0 release so that will likely be a while.