umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.41k stars 2.66k forks source link

Failed to acquire read lock for id: -335 #15988

Open Timmeeehh opened 5 months ago

Timmeeehh commented 5 months ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

13.2.2

Bug summary

After migrating to Umbraco 13 one month ago we started experiencing a lot of "Failed to acquire read lock for id: -335" errors.

We have looked into the issue described in this thread however, after some analysis of all the errors it seems that the error occurs in the Umbraco Member Service instead of the content tree as described in the other issue.

Specifics

We are running a multilingual site with thousands of nodes and thousands of members. Site is hosted in Azure on a single Linux Web app with a SQL DB. Web App: Premium v3 P2V3 Memory ~50% CPU ~12%. SQL DB: Hyperscale - Serverless: Gen5, 2 vCores. Nowhere near max capacity.

After migrating to Umbraco 13 one month ago we started experiencing a lot of "Failed to acquire read lock for id: -335" errors.

We have looked into the issue described in this thread however, after some analysis of all the errors it seems that the error occurs in the Umbraco Member Service instead of the content tree as described in the other issue.

Umbraco.Cms.Core.DistributedLocking.Exceptions.DistributedReadLockTimeoutException: Failed to acquire read lock for id: -335.
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout)
   at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)
   at Umbraco.Cms.Core.Scoping.LockingMechanism.EagerReadLockInner(Guid instanceId, Nullable`1 timeout, Int32[] lockIds)
   at Umbraco.Cms.Core.Services.MemberService.GetByKey(Guid id)
   at Umbraco.Cms.Core.PropertyEditors.ValueConverters.MemberPickerValueConverter.ConvertIntermediateToObject(IPublishedElement owner, IPublishedPropertyType propertyType, PropertyCacheLevel cacheLevel, Object source, Boolean preview)
   at Umbraco.Cms.Infrastructure.PublishedCache.Property.GetValue(String culture, String segment)
   at Umbraco.Extensions.PublishedPropertyExtension.Value[T](IPublishedProperty property, IPublishedValueFallback publishedValueFallback, String culture, String segment, Fallback fallback, T defaultValue)
   at Umbraco.Extensions.PublishedContentExtensions.Value[T](IPublishedContent content, IPublishedValueFallback publishedValueFallback, String alias, String culture, String segment, Fallback fallback, T defaultValue)

The error is always followed by the following error:

Microsoft.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded.
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
   at Microsoft.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
   at Microsoft.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteScalar()
   at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)
   at NPoco.Database.ExecuteScalarHelper(DbCommand cmd)
   at NPoco.Database.ExecuteScalar[T](String sql, CommandType commandType, Object[] args)
   at Umbraco.Cms.Infrastructure.Persistence.UmbracoDatabase.ExecuteScalar[T](String sql, Object[] args)
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock.ObtainReadLock()
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout)
ClientConnectionId:5171251a-0587-48df-b8a8-1243d10ca3b5
Error Number:1222,State:72,Class:16

The Full Stack Trace shows multiple locations in our code where this occurs. It is always the member server that is the culprit.

Umbraco.Cms.Core.DistributedLocking.Exceptions.DistributedReadLockTimeoutException: Failed to acquire read lock for id: -335.
   at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout)
   at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary`2& locks, HashSet`1& locksSet, Action`2 obtainLock, Nullable`1 timeout, Int32 lockId)
   at Umbraco.Cms.Core.Scoping.LockingMechanism.EagerReadLockInner(Guid instanceId, Nullable`1 timeout, Int32[] lockIds)
   at Umbraco.Cms.Core.Services.MemberService.GetByKey(Guid id)
   at Umbraco.Cms.Core.PropertyEditors.ValueConverters.MemberPickerValueConverter.ConvertIntermediateToObject(IPublishedElement owner, IPublishedPropertyType propertyType, PropertyCacheLevel cacheLevel, Object source, Boolean preview)
   at Umbraco.Cms.Infrastructure.PublishedCache.Property.GetValue(String culture, String segment)
   at Umbraco.Extensions.PublishedPropertyExtension.Value[T](IPublishedProperty property, IPublishedValueFallback publishedValueFallback, String culture, String segment, Fallback fallback, T defaultValue)
   at Umbraco.Extensions.PublishedContentExtensions.Value[T](IPublishedContent content, IPublishedValueFallback publishedValueFallback, String alias, String culture, String segment, Fallback fallback, T defaultValue)
   at MapperService.MapProject(Project project) in /home/vsts/work/1/s/src/MapperService.cs:line 148
 at ProjectController.Index() in /home/vsts/work/1/s/src/ProjectController.cs:line 80
   at lambda_method215(Closure, Object, Object[])
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Logged|12_1(ControllerActionInvoker invoker)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ExceptionContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.InvokeNextResourceFilter()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.InvokeFilterPipelineAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
   at Umbraco.Cms.Web.Common.Middleware.BasicAuthenticationMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)
   at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<<CreateMiddleware>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Umbraco.Cms.Web.BackOffice.Middleware.BackOfficeExternalLoginProviderErrorMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)
   at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<<CreateMiddleware>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at UrlTracker.Middleware.UrlTrackerRedirectMiddleware.InvokeAsync(HttpContext context)
   at UrlTracker.Middleware.UrlTrackerClientErrorTrackingMiddleware.InvokeAsync(HttpContext context)
   at Microsoft.AspNetCore.Session.SessionMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Session.SessionMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Localization.RequestLocalizationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at StackExchange.Profiling.MiniProfilerMiddleware.Invoke(HttpContext context) in C:\projects\dotnet\src\MiniProfiler.AspNetCore\MiniProfilerMiddleware.cs:line 112
   at Umbraco.Cms.Web.Common.Middleware.UmbracoRequestMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)
   at Umbraco.Cms.Web.Common.Middleware.UmbracoRequestMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)
   at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<<CreateMiddleware>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Umbraco.Cms.Web.Common.Middleware.PreviewAuthenticationMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)
   at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<<CreateMiddleware>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at Umbraco.Cms.Web.Common.Middleware.UmbracoRequestLoggingMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)
   at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<<CreateMiddleware>b__0>d.MoveNext()
--- End of stack trace from previous location ---
   at SixLabors.ImageSharp.Web.Middleware.ImageSharpMiddleware.Invoke(HttpContext httpContext, Boolean retry)
   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.<Invoke>g__Awaited|10_0(ExceptionHandlerMiddlewareImpl middleware, HttpContext context, Task task)

This is the code at the mapper service image

The error occurs in project.ProjectContact is Contact contact

Anyone have any idea on potential fixes?

Steps to reproduce

I have not been able to reproduce this locally... It occurs on our PRD environment.

Expected result / actual result

We do not expect this error to occur.

github-actions[bot] commented 5 months ago

Hi there @Timmeeehh!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

jerpenol commented 3 months ago

We are encountering the same locking issues (13.3.1), when getting a member by email and then updating the member.

image image

The logic called is as follow:

We have an API (an ApiController) that allows external applications to get, update and create members. An external function app first tries to get an existing member to get the Guid.

A transient service within our application calls the IMemberService with the following code:

private readonly IMemberService _memberService;

public CustomService1(IMemberService memberService) 
{
  _memberService = memberService;
}

public IEnumerable<IMember> GetMembersByEmailAddress(string emailAddress) 
{
  return _memberService.FindByEmail(emailAddress, 0, 10, out long totalResults, StringPropertyMatchType.Exact);
}

The external application then grabs the Guid of the member it wants to update from these results and calls the API to update the member with that guid. This uses another transient service, that uses the following code:

private readonly IMemberService _memberService;

public CustomService2(IMemberService memberService) 
{
  _memberService = memberService;
}

public async Task UpdateOrCreate(RequestClass request) 
{
  IMember member = _memberService.GetByKey(request.Key);

   // - Member gets updated here - 

  _memberService.Save(member);

  return;
}
Timmeeehh commented 3 months ago

@jerpenol We had exactly such a setup. Setting email preferences for a user via an API. This caused constant Lock errors and the site to crash. We are currently in the process of rewriting our codebase to migrate from using the member service for retrieving members to the memberManager. The IMemberManager is another way to work with member data in the form of an IMemberShipUser.

By focussing on operations that were heavy for the member service this has limited the amount of requests which is currently reducing the amount of locking errors. (We had 3K+ per day...)

It is strange that this is needed as the memberService should be able to handle these simple requests. It is probably not the core fix and could have something to do with Lock -333 Issue where the underlying Umbraco Scoping mechanism could be a potential culprit.

Let me know if you find anything related to this error :)

Example for the MemberManager:

public async Task<IEnumerable<IMember>> GetMembersByEmailAddress(string emailAddress) 
{
  return = await _memberManager.FindByEmailAsync(emailAddress);
}
jerpenol commented 3 months ago

@Timmeeehh Actually, now I am taking a better look and we are already using the IMemberManager. It looks like the use of the IMemberManager causes the actual issue, when I remove that piece of code and call the API it works again.

The flow is actually as follows:

Maybe it has something to do with the IMemberManager being a scoped service?

We had some issues with that, so what we do when we use the member manager is instantiate a IServiceScope and the use it like below.

private readonly IServiceScopeFactory _scopeFactory;

public CustomTransientService(IServiceScopeFactory scopeFactory) 
{
  _scopeFactory= scopeFactory;
}

public IPublishedContent Method()
{
  using IServiceScope? scope = _scopeFactory.CreateScope();
  IMemberManager memberManager = scope.ServiceProvider.GetRequiredService<IMemberManager>();

  var identity = await memberManager.GetCurrentMemberAsync();

  return await memberManager.AsPublishedContent(identity);
}

I actually map the IPublishedContent to a custom class before disposing the scope.

jerpenol commented 3 months ago

Looking further into this IMemberManager.AsPublishedContent(MemberIdentityUser user) actually uses IMemberUserStore.GetPublishedMember(MemberIdentityUser user). The store in turn uses IMemberService.GetByKey(Guid key). So the MemberManager is not actually using any cache, but also talks directly to the database when casting it to IPublishedContent.

My guess is that the IMemberService call causes a lock that is not being released (properly) when called from our custom workflow. Then the Member API tries to access the member table and obtain the same lock, which fails.

The following code is used in the store

public IPublishedContent? GetPublishedMember(MemberIdentityUser? user)
{
    if (user == null)
    {
        return null;
    }

    IMember? member = _memberService.GetByKey(user.Key);
    if (member == null)
    {
        return null;
    }

    IPublishedSnapshot publishedSnapshot = _publishedSnapshotAccessor.GetRequiredPublishedSnapshot();
    return publishedSnapshot.Members?.Get(member);
}
jerpenol commented 3 months ago

Pretty sure that the issue lies within the IMemberService.GetByKey, because when I remove all references to the IMemberManager in my workflow and call IMemberService.GetByKey directly the same locking issue occurs.

abellisai-qwerty commented 1 month ago

Hello everyone i confirm the same problem with Umbraco 13.3.0. It happend when i invoke MemberService.Save() Following more information:

We’re running an Umbraco 13.0.1 website on Azure with the following Azure configuration:

SQL Azure Database S2 50 DTU

App settings configuration on Azure, ad described on documentation: image

The error occurs in contexts where a save operation is performed on members or when modifying the groups to which a member is associated, e.g.: image

This is the first Errror: 2024-07-23T13:13:30.7035751+00:00

Microsoft.Data.SqlClient.SqlException (0x80131904): Lock request time out period exceeded. The statement has been terminated. at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted) at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest) at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method) at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery() at StackExchange.Profiling.Data.ProfiledDbCommand.ExecuteNonQuery() in C:\projects\dotnet\src\MiniProfiler.Shared\Data\ProfiledDbCommand.cs:line 261 at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.<>c__DisplayClass38_01.b__0() at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func1 func) at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.Execute[T](Func1 f) at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.ExecuteNonQuery() at NPoco.Database.ExecutionHook[T](Func`1 action) at NPoco.Database.ExecuteNonQueryHelper(DbCommand cmd) at NPoco.Database.Execute(String sql, CommandType commandType, Object[] args) at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock.ObtainWriteLock() at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout) ClientConnectionId:a25aafea-484a-45f2-baf9-bf8ba43634c2 Error Number:1222,State:51,Class:16 ClientConnectionId before routing:c6ad8be4-aac6-4d2e-8835-6d57d82313c6 Routing Destination:c6a8b39eafa0.tr27179.westeurope1-a.worker.database.windows.net,11036

Followed by this error: 2024-07-23T13:13:30.7060052+00:00

BeginForgotPassword Failed to acquire write lock for id: -335., at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.SqlServerDistributedLock..ctor(SqlServerDistributedLockingMechanism parent, Int32 lockId, DistributedLockType lockType, TimeSpan timeout) at Umbraco.Cms.Persistence.SqlServer.Services.SqlServerDistributedLockingMechanism.WriteLock(Int32 lockId, Nullable1 obtainLockTimeout) at Umbraco.Cms.Core.Scoping.LockingMechanism.ObtainWriteLock(Int32 lockId, Nullable1 timeout) at Umbraco.Cms.Core.Scoping.LockingMechanism.LockInner(Guid instanceId, Dictionary2& locks, HashSet1& locksSet, Action2 obtainLock, Nullable1 timeout, Int32 lockId) at Umbraco.Cms.Core.Scoping.LockingMechanism.EagerWriteLockInner(Guid instanceId, Nullable`1 timeout, Int32[] lockIds) at Umbraco.Cms.Core.Services.MemberService.Save(IMember member) at Naitec.ABTPaperless.Code.Services.ResetPasswordService.BeginForgotPassword(IPublishedContent formPage, String email, Boolean afterUpdateForceReset, UmbracoHelper umbracoHelper) in C:\naitec\SMS\Main\DotNetCore\ABTPaperless\Naitec.ABTPaperless.Code\Services\ResetPasswordService.cs:line 69

Maybe something related to the internal profiler used by Umbraco?

skttl commented 1 month ago

We are encountering locks for id -335 occassionally on a somewhat large membersite (just shy of 50.000 members). I think it's because the IMemberManager.FindByEmail uses MemberService.GetByEmail, and the db performance when looking up members based on their email is... well, not fast.