umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.45k stars 2.68k forks source link

Database timeouts and panic exceptions on accessing the back office or restarting the site #16531

Open ewuski opened 4 months ago

ewuski commented 4 months ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

10.8.3, 13.2.1

Bug summary

Since we migrated to Umbraco 10 (in November 2023) we've been experiencing database timeouts, sometimes severe leading to fatal exceptions and site crashes.

We recently upgraded to Umbraco 13.2.1 and the issue is still happening.

Specifics

The issue (or similar issues) happens very often when accessing the back office, usually every day.

It also sometimes happened just after we deployed changes to remote servers. It occurs extremely occasionally locally.

The symptom is that the back office becomes un-browsable, eg.:

The errors thrown are of this kind or similar:

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

---> System.ComponentModel.Win32Exception (258): The wait operation timed out.

at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)

at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at Microsoft.Data.SqlClient.SqlDataReader.TrySetMetaData(_SqlMetaDataSet metaData, Boolean moreInfo)

at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()

at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)

at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)

at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)

at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)

at Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)

at StackExchange.Profiling.Data.ProfiledDbCommand.ExecuteDbDataReader(CommandBehavior behavior) in C:\projects\dotnet\src\MiniProfiler.Shared\Data\ProfiledDbCommand.cs:line 204

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.<>c__DisplayClass38_0`1.b__0()

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.Execute[T](Func`1 f)

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.ExecuteDbDataReader(CommandBehavior behavior)

at NPoco.Database.ExecutionHook[T](Func`1 action)

at NPoco.Database.ExecuteReaderHelper(DbCommand cmd)

at NPoco.Database.ExecuteDataReader(DbCommand cmd, Boolean sync)

at NPoco.AsyncHelper.RunSync[T](Task`1 task)

at NPoco.Database.QueryImp[T](T instance, Expression1 listExpression, Func2 idFunc, Sql Sql, PocoData pocoData)+MoveNext()

at System.Collections.Generic.List1..ctor(IEnumerable1 collection)

at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

at NPoco.Database.Fetch[T](Sql sql)

at Umbraco.Cms.Infrastructure.Persistence.Repositories.Implement.EntityRepository.GetEntities(Sql`1 sql, Boolean isContent, Boolean isMedia, Boolean isMember)

at Umbraco.Cms.Infrastructure.Persistence.Repositories.Implement.EntityRepository.GetByQuery(IQuery`1 query, Guid objectType)

at Umbraco.Cms.Core.Services.EntityService.GetChildren(Int32 parentId, UmbracoObjectTypes objectType)

at Umbraco.Cms.Web.BackOffice.Trees.ContentTreeControllerBase.GetChildrenFromEntityService(Int32 entityId)

at Umbraco.Cms.Web.BackOffice.Trees.ContentTreeControllerBase.GetChildEntities(String id, FormCollection queryStrings)

at Umbraco.Cms.Web.BackOffice.Trees.ContentTreeControllerBase.PerformGetTreeNodes(String id, FormCollection queryStrings)

at Umbraco.Cms.Web.BackOffice.Trees.ContentTreeControllerBase.GetTreeNodesInternal(String id, FormCollection queryStrings)

at Umbraco.Cms.Web.BackOffice.Trees.ContentTreeControllerBase.GetTreeNodes(String id, FormCollection queryStrings)

at Umbraco.Cms.Web.BackOffice.Trees.TreeControllerBase.GetTreeNodesAsync(String id, FormCollection queryStrings)

at Umbraco.Cms.Web.BackOffice.Trees.TreeControllerBase.GetNodes(String id, FormCollection queryStrings)

at Umbraco.Cms.Web.BackOffice.Trees.ApplicationTreeController.GetChildren(Tree tree, Int32 id, FormCollection querystring)

at Umbraco.Cms.Web.BackOffice.Trees.ApplicationTreeController.GetTreeRootNode(Tree tree, Int32 id, FormCollection querystring)

at Umbraco.Cms.Web.BackOffice.Trees.ApplicationTreeController.GetApplicationTrees(String application, String tree, FormCollection queryStrings, TreeUse use)

at lambda_method3727(Closure, Object)

at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(ActionContext actionContext, IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)

at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)

at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)

ClientConnectionId:055b34a6-b9bd-439d-8e2e-2525830a0d49

Error Number:-2,State:0,Class:11

Timestamp 2024-05-31T10:57:54.2873292+00:00 @MessageTemplate Exception ({InstanceId}). InstanceId 87c76e44 SourceContext
ActionId 5ee40b82-c875-4f13-9037-0d936dac95e2 ActionName Umbraco.Cms.Web.BackOffice.Trees.ApplicationTreeController.GetApplicationTrees (Umbraco.Web.BackOffice) RequestId 40000444-0007-c300-b63f-84710c7967bb RequestPath /umbraco/backoffice/umbracotrees/applicationtree/GetApplicationTrees ProcessId 8224 ProcessName w3wp ThreadId 72 ApplicationId 29830c6038e821cf53320d438c1c05735a7e6ae5 MachineName
Log4NetLevel ERROR HttpRequestId b1ea4d94-6b94-4e3f-9bb3-da5221d13f72 HttpRequestNumber
HttpSessionId 28135be1-def0-c99d-3911-5ce0c53faff9

Yesterday a fatal exception happened during the deployment of our changes to production and test sites. The deployment caused the site to restart which resulted in db timeouts that made the entire site crash. The source of the error logged is an issue with loading cache data but also indicates db as the source of the issue:

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

---> System.ComponentModel.Win32Exception (258): The wait operation timed out.

at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)

at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at Microsoft.Data.SqlClient.SqlDataReader.TrySetMetaData(_SqlMetaDataSet metaData, Boolean moreInfo)

at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)

at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()

at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)

at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)

at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)

at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)

at Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)

at StackExchange.Profiling.Data.ProfiledDbCommand.ExecuteDbDataReader(CommandBehavior behavior) in C:\projects\dotnet\src\MiniProfiler.Shared\Data\ProfiledDbCommand.cs:line 201

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.<>c__DisplayClass38_0`1.b__0()

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.RetryPolicy.ExecuteAction[TResult](Func`1 func)

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.Execute[T](Func`1 f)

at Umbraco.Cms.Infrastructure.Persistence.FaultHandling.FaultHandlingDbCommand.ExecuteDbDataReader(CommandBehavior behavior)

at NPoco.Database.ExecutionHook[T](Func`1 action)

at NPoco.Database.ExecuteReaderHelper(DbCommand cmd)

at NPoco.Database.ExecuteDataReader(DbCommand cmd, Boolean sync)

at NPoco.AsyncHelper.RunSync[T](Task`1 task)

at NPoco.Database.QueryImp[T](T instance, Expression1 listExpression, Func2 idFunc, Sql Sql, PocoData pocoData)+MoveNext()

at Umbraco.Extensions.NPocoDatabaseExtensions.QueryPaged[T](IDatabase database, Int64 pageSize, Sql sql, Sql sqlCount)+MoveNext()

at Umbraco.Cms.Infrastructure.PublishedCache.Persistence.NuCacheContentRepository.GetAllContentSources()+MoveNext()

at Umbraco.Extensions.EnumerableExtensions.InGroupsOf[T](IEnumerable`1 source, Int32 groupSize)+MoveNext()

at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.SetAllFastSortedLocked(IEnumerable`1 kits, Int32 kitGroupSize, Boolean fromDb)

at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.LoadContentFromDatabaseLocked(Boolean onStartup)

at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.b__56_3()

at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.LockAndLoadContent(Func`1 action)

at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.b__56_0()

at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)

at System.Threading.LazyInitializer.EnsureInitialized[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)

at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.EnsureCaches()

at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.CreatePublishedSnapshot(String previewToken)

at Umbraco.Cms.Web.Common.UmbracoContext.UmbracoContext.<>c__DisplayClass11_0.<.ctor>b__0()

at System.Lazy`1.ViaFactory(LazyThreadSafetyMode mode)

at System.Lazy`1.ExecutionAndPublication(LazyHelper executionAndPublication, Boolean useDefaultConstructor)

at System.Lazy`1.CreateValue()

at System.Lazy`1.get_Value()

at Umbraco.Cms.Web.Common.UmbracoContext.UmbracoContext.get_PublishedSnapshot()

at Umbraco.Cms.Web.Common.UmbracoContext.UmbracoContext.get_Content()

at Umbraco.Cms.Web.Website.Routing.UmbracoRouteValueTransformer.TransformAsync(HttpContext httpContext, RouteValueDictionary values)

at Microsoft.AspNetCore.Mvc.Routing.DynamicControllerEndpointMatcherPolicy.ApplyAsync(HttpContext httpContext, CandidateSet candidates)

at Microsoft.AspNetCore.Routing.Matching.DfaMatcher.SelectEndpointWithPoliciesAsync(HttpContext httpContext, IEndpointSelectorPolicy[] policies, CandidateSet candidateSet)

at Microsoft.AspNetCore.Routing.EndpointRoutingMiddleware.g__AwaitMatch|10_1(EndpointRoutingMiddleware middleware, HttpContext httpContext, Task matchTask)

at StackExchange.Profiling.MiniProfilerMiddleware.Invoke(HttpContext context) in C:\projects\dotnet\src\MiniProfiler.AspNetCore\MiniProfilerMiddleware.cs:line 112

at Umbraco.Cms.Web.Common.Middleware.UmbracoRequestMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)

at Umbraco.Cms.Web.Common.Middleware.UmbracoRequestMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)

at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<b__0>d.MoveNext()

--- End of stack trace from previous location ---

at Umbraco.Cms.Web.Common.Middleware.PreviewAuthenticationMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)

at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<b__0>d.MoveNext()

--- End of stack trace from previous location ---

at Umbraco.Cms.Web.Common.Middleware.UmbracoRequestLoggingMiddleware.InvokeAsync(HttpContext context, RequestDelegate next)

at Microsoft.AspNetCore.Builder.UseMiddlewareExtensions.InterfaceMiddlewareBinder.<>c__DisplayClass2_0.<b__0>d.MoveNext()

--- End of stack trace from previous location ---

at SixLabors.ImageSharp.Web.Middleware.ImageSharpMiddleware.Invoke(HttpContext httpContext, Boolean retry)

at RobotsHeaderMiddleware.InvokeAsync(HttpContext context) in D:\a\1\s\MyDomain.Web\Composers\RobotsHeaderMiddleware.cs:line 21

at SixLabors.ImageSharp.Web.Middleware.ImageSharpMiddleware.Invoke(HttpContext httpContext, Boolean retry)

at WebPMiddleware.Invoke(HttpContext httpContext) in D:\a\1\s\MyDomain.Web\Composers\WebPMiddleware.cs:line 33

at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.g__Awaited|10_0(ExceptionHandlerMiddlewareImpl middleware, HttpContext context, Task task)

ClientConnectionId:bd511897-30e1-43ec-acc6-487faca2ecb0

Error Number:-2,State:0,Class:11

Timestamp 2024-05-30T17:50:02.0912432+00:00 @MessageTemplate Panic, exception while loading cache data. SourceContext
RequestId 40001d0a-0002-a500-b63f-84710c7967bb RequestPath / ProcessId 1548 ProcessName w3wp ThreadId 38 ApplicationId 29830c6038e821cf53320d438c1c05735a7e6ae5 MachineName
Log4NetLevel FATAL HttpRequestId 15b58209-6a5e-4037-8b59-11b1babeb7db HttpRequestNumber
HttpSessionId 0

The issue then affects the front-end API actions like saving or publishing content, which produces more errors:

image

Steps to reproduce

We do not know what triggers the issue. It just happens when we log in to the back office without doing anything else, just because we are not able to do anything else because most of the content in the back office does not load.

Here are our config settings:

{
  "$schema": "./appsettings-schema.json",
  "Serilog": {
    "MinimumLevel": {
      "Default": "Information",
      "Override": {
        "Microsoft": "Warning",
        "Microsoft.Hosting.Lifetime": "Information",
        "System": "Warning"
      }
    }
  },
  "Umbraco": {
    "CMS": {
      "ModelsBuilder": {
        "ModelsMode": "Nothing",
        "DebugLevel": 0
      },
      "Global": {
        "ShowMaintenancePageWhenInUpgradeState": true,
        "Id": "9402c45c-7fad-4689-9b58-5c49573e5490",
        "SanitizeTinyMce": true,
        "UseHttps": true,
        "MainDomLock": "FileSystemMainDomLock",
        "DefaultUILanguage": "en-GB"
      },
      "Content": {
        "MacroErrors": "Silent",
        "ContentVersionCleanupPolicy": {
          "EnableCleanup": false
        }
      }
    }
  }
}

We tried to increase the db timeout but it didn't help anything.

Our server is a dedicated server:

Memory 8 GB SSD Disk 2 x 480 GB SSD Operating System Windows Server 2022

Expected result / actual result

The back office should load fine and do not produce db timeouts.

github-actions[bot] commented 4 months ago

Hi there @ewuski!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

ewuski commented 4 months ago

Closing as resolved