umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.4k stars 2.66k forks source link

Slow performance and deadlocks in SignInAsync during high load #13768

Open Abrissirba opened 1 year ago

Abrissirba commented 1 year ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

10.1

Bug summary

We experiencing deadlocks in the database when lots of members sign in at the same time. During peaks we have around 100 sign ins a minute. We use umbracos default membership provider. The sign ins get very slow and we also see deadlocks in the logs.

Specifics

Setting the logger in debug mode locally I got the following two stacktraces.

Microsoft.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 81) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)
   at NPoco.Database.ExecuteNonQueryHelper(DbCommand cmd)
   at NPoco.Database.Execute(String sql, CommandType commandType, Object[] args)
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock.ObtainWriteLock()
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout)
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.WriteLock(Int32 lockId, Nullable`1 obtainLockTimeout)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.EagerWriteLockInner(Guid instanceId, Nullable`1 timeout, Int32[] lockIds)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.EnsureDbLocks()
   at Umbraco.Cms.Infrastructure.Scoping.Scope.get_Database()
   at Umbraco.Cms.Infrastructure.Scoping.Scope.get_Database()
   at Umbraco.Cms.Infrastructure.Persistence.Repositories.Implement.MemberRepository.PersistUpdatedItem(IMember entity)
   at Umbraco.Cms.Core.Cache.DefaultRepositoryCachePolicy`2.Update(TEntity entity, Action`1 persistUpdated)
   at Umbraco.Cms.Core.Services.MemberService.Save(IMember member)
   at Umbraco.Cms.Core.Security.MemberUserStore.UpdateAsync(MemberIdentityUser user, CancellationToken cancellationToken)
ClientConnectionId:27b072f9-dd7a-40fb-8c14-93950869d88c
Error Number:1205,State:51,Class:13
at System.Environment.get_StackTrace()at 
Umbraco.Cms.Infrastructure.Persistence.UmbracoDatabase.OnException(Exception ex)at 
NPoco.Database.OnExceptionInternal(Exception exception)at 
NPoco.Database.Execute(String sql, CommandType commandType, Object[] args)at 
Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock.ObtainWriteLock()at 
Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout)at 
Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.WriteLock(Int32 lockId, Nullable`1 obtainLockTimeout)at 
Umbraco.Cms.Infrastructure.Scoping.Scope.ObtainWriteLock(Int32 lockId, Nullable`1 timeout)at 
Umbraco.Cms.Infrastructure.Scoping.Scope.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)at 
Umbraco.Cms.Infrastructure.Scoping.Scope.EagerWriteLockInner(Guid instanceId, Nullable`1 timeout, Int32[] lockIds)at 
Umbraco.Cms.Infrastructure.Scoping.Scope.EnsureDbLocks()at 
Umbraco.Cms.Infrastructure.Scoping.Scope.get_Database()at 
Umbraco.Cms.Infrastructure.Scoping.Scope.get_Database()at 
Umbraco.Cms.Infrastructure.Persistence.Repositories.Implement.MemberRepository.PersistUpdatedItem(IMember entity)at 
Umbraco.Cms.Core.Cache.DefaultRepositoryCachePolicy`2.Update(TEntity entity, Action`1 persistUpdated)at 
Umbraco.Cms.Core.Services.MemberService.Save(IMember member)at 
Umbraco.Cms.Core.Security.MemberUserStore.UpdateAsync(MemberIdentityUser user, CancellationToken cancellationToken)at 
Microsoft.AspNetCore.Identity.UserManager`1.UpdateUserAsync(TUser user)at 
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)at 
Microsoft.AspNetCore.Identity.UserManager`1.UpdateUserAsync(TUser user)at 
Microsoft.AspNetCore.Identity.UserManager`1.UpdateAsync(TUser user)at 
Umbraco.Cms.Web.Common.Security.UmbracoSignInManager`1.HandleSignIn(TUser user, String username, SignInResult result)at 
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)at 
Umbraco.Cms.Web.Common.Security.UmbracoSignInManager`1.HandleSignIn(TUser user, String username, SignInResult result)at 
Umbraco.Cms.Web.Common.Security.UmbracoSignInManager`1.SignInWithClaimsAsync(TUser user, AuthenticationProperties authenticationProperties, IEnumerable`1 additionalClaims)at 
System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)at 
Umbraco.Cms.Web.Common.Security.UmbracoSignInManager`1.SignInWithClaimsAsync(TUser user, AuthenticationProperties authenticationProperties, IEnumerable`1 additionalClaims)at 
Microsoft.AspNetCore.Identity.SignInManager`1.SignInAsync(TUser user, AuthenticationProperties authenticationProperties, String authenticationMethod)at 
Microsoft.AspNetCore.Identity.SignInManager`1.SignInAsync(TUser user, Boolean isPersistent, String authenticationMethod)at 
XXX.Web.Business.Helpers.AuthenticationHelper.SignInToUmbracoAsync(LoginResult clientLogin) in AuthenticationHelper.cs:line 232at 

Steps to reproduce

I can recreate it locally by using a load testing framework (we used https://locust.io/).

Expected result / actual result

No response

github-actions[bot] commented 1 year ago

Hi there @Abrissirba!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

strudland commented 1 year ago

I just wanted to affirm that we are also encountering the same challenges.

Migaroez commented 11 months ago

Hey @Abrissirba & @strudland, sorry for the late reply, this one has seemed to have slipped trough the cracks.

I will speak to the team on how we are going to approach this one and get back to you asap.

Migaroez commented 11 months ago

✅ Reproduced in v10 (10.1.0) ✅ Reproduced in v10 (10.7.0)

Reproduction did not result in deadlocks but the local solution complexly crashing, this might be because of sqlLite, will investigate further if needed No deadlock errors but the amount of total requests went from 9452 (for just requesting the login page) to 614 passed 126 failed for when logging in then login out members with the following k9 stage configuration. This results in 45 succeeded and 123 failed login attempts.

    stages: [
        { duration: '10s', target: 10 },
        { duration: '10s', target: 30 },
        { duration: '10s', target: 50 },
        { duration: '60s', target: 100 },
        { duration: '20s', target: 200 },
        { duration: '10s', target: 30 },
        { duration: '10s', target: 10 },
    ],

full test code and setup instructions included in 13768.zip

Migaroez commented 11 months ago

✅ Reproduced in v11 (11.6.0) v11results.zip (zip has the wrong name)

Migaroez commented 11 months ago

✅ Reproduced in v12 (12.2.0)

v12results.zip

Finally was able to capture an error on SqlLite, but it doesn't show in the logs as the app kills itself

Umbraco.Cms.Core.DistributedLocking.Exceptions.DistributedWriteLockTimeoutException: Failed to acquire write lock for id: -341.
   at Umbraco.Cms.Persistence.Sqlite.Services.SqliteDistributedLockingMechanism.SqliteDistributedLock.ObtainWriteLock() in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Cms.Persistence.Sqlite\Services\SqliteDistributedLockingMechanism.cs:line 175
   at Umbraco.Cms.Persistence.Sqlite.Services.SqliteDistributedLockingMechanism.SqliteDistributedLock..ctor(SqliteDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout) in C:\Repos\Umbraco-CMS-V12\src\Umbr
aco.Cms.Persistence.Sqlite\Services\SqliteDistributedLockingMechanism.cs:line 82
   at Umbraco.Cms.Persistence.Sqlite.Services.SqliteDistributedLockingMechanism.WriteLock(Int32 lockId, Nullable`1 obtainLockTimeout) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Cms.Persistence.Sqlite\Services\SqliteDistributedLockingMechanism.cs:line
51
   at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable`1 timeout) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Core\Scoping\LockingMechanism.cs:line 98
   at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Core\Scoping\LockingMechanism.cs:lin
e 172
   at Umbraco.Cms.Core.Scoping.LockingMechanism.EagerWriteLockInner(Guid instanceId, Nullable`1 timeout, Int32[] lockIds) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Core\Scoping\LockingMechanism.cs:line 74
   at Umbraco.Cms.Core.Scoping.LockingMechanism.EagerWriteLock(Guid instanceId, Nullable`1 timeout, Int32[] lockIds) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Core\Scoping\LockingMechanism.cs:line 54
   at Umbraco.Cms.Core.Scoping.CoreScope.EagerWriteLock(Int32[] lockIds) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Core\Scoping\CoreScope.cs:line 202
   at Umbraco.Cms.Infrastructure.HostedServices.ScheduledPublishing.PerformExecuteAsync(Object state) in C:\Repos\Umbraco-CMS-V12\src\Umbraco.Infrastructure\HostedServices\ScheduledPublishing.cs:line 116
jesperordrup commented 9 months ago

reproduced in 12.3.3

ronaldbarendse commented 8 months ago

I've seen the Failed to acquire write lock for id: -341. error a few times while doing large content deployments/publishes on SQLite as well. I believe the issue is caused by the fact SQLite can only have a single concurrent writable transaction, as mentioned in this comment:

https://github.com/umbraco/Umbraco-CMS/blob/4aa0378431202152fad74c53456ad04adab8174f/src/Umbraco.Cms.Persistence.Sqlite/Services/SqliteDatabaseCreator.cs#L19-L38

Because the ScheduledPublishing background job takes an eager write lock on Constants.Locks.ScheduledPublishing, it will completely block any other write transaction or needs to wait until the current write transaction has completed, with the potential to timeout either transaction.

It probably doesn't help that the scope in this background job uses autoComplete: true since it got refactored in v12: https://github.com/umbraco/Umbraco-CMS/blob/c30ffa9ac3ae9c12508242855c63208567c55eb9/src/Umbraco.Infrastructure/BackgroundJobs/Jobs/ScheduledPublishingJob.cs#L68

Besides ensuring this explicitly completes the scope (using scope.Complete();), I'm not sure how we can actually fix the issue. We still want to ensure all scheduled content is published on time, so the background job needs to run every minute... But maybe we can avoiding the need to take the write lock every minute?

The same probably also applies to members: every sign-in currently updates the members LastLoginDateUtc, AccessFailedCount (if it was more than 0) and security stamp (if AllowConcurrentLogins is disabled), which takes a write lock on Constants.Locks.MemberTree...