Closed gabador-mof closed 11 months ago
I'd love to get to the bottom of this, can I start with a few questions?
1) What do you have configured for the appsettings key Umbraco.Core.MainDom.Lock
?
2) Do the umbraco containers each have their own instance of NuCache.Content.db or is there any volume magic that could result in multiple processes attempting to write to the file concurrently?
3) Could you add the following element to Config/serilog.config under the appSettings element to see if the more verbose logging adds any aditional clues <add key="serilog:minimum-level:override:Umbraco.Core.Runtime" value="Debug" />
?
Hey!
1 SqlMainDomLock 2 each have their own instance of the content DB 3 - Done. Adding now.
Just to reiterate, it takes the same amount of time locally, trying to connect to an azure DB... It just runs and runs and runs...
What log messages do you have from the category Umbraco.Core.Sync.DatabaseServerMessenger
when booting the site?
In the long term a 10 minute boot for a website seems unnaceptable but to resolve this requires a large refactor to the nucache setup, work is planned here in the backlog but not for v8.
In the short term a 10 minute boot for a reasonably large site sounds comparable to other sites of similar scale so i'm far more concerned about the exceptions writing to the nucache file than the slow boot.
Yeah, I don't mind a 10 minute boot time with a large site, but I can't live with it failing to start. Currently it just falls over starting every time. Incidentally, To see if it had anything to do with the dabasae nucache, I deleted all its contents. What would be a way of rebuilding this, if I cannot get into the CMS?
I'm just fishing for some error messages but there are thousands in the logs, so not sure what's relevant.
This is the only message from the DatabaseServerMessenger
No last synced Id found, this generally means this is a new server/install. The server will build its caches and indexes, and then adjust its last synced Id to the latest found in the database and maintain cache updates based on that Id.
As in you truncated the table dbo.umbracoCacheInstruction
?
That should result in a cold boot as the last synced id recorded on disk doesn't agree with the database.
So just turning the app off and old should result in that table being populated and then the NuCache.Content.db being populated.
The more interesting log messages should probably come from Umbraco.Core.Sync.DatabaseServerMessenger
and Umbraco.Core.Runtime
No last synced Id found
When the app starts, a file is read {{LocalTempPath}}/DistCache/{{AppDomainId}}-lastsynced.txt
this should have the id of the last instruction from dbo.umbracoCacheInstruction
that was processed to be compared against the contents of dbo.umbracoCacheInstruction
.
In your case that *-lastsynced.txt
was not found.
No, I Truncated the [dbo].[cmsContentNu] Table 😬 It rebuilt it on another environment, but on here, I get a heap of messages :
Missing cmsContentNu published content for node 10065, consider rebuilding. and Skip item id=10065, could not find parent id=9487.
And then after 15 minutes, it dies with The process cannot access the file 'C:\inetpub\wwwroot\stylus-be\App_Data\TEMP\NuCache\NuCache.Content.db' because it is being used by another process.
Around the time it dies what do you have logged from Umbraco.Core.Runtime
Additionally adding <add key="serilog:minimum-level:override:Umbraco.Web.PublishedCache.NuCache" value="Debug" />
along with the one mentioned before <add key="serilog:minimum-level:override:Umbraco.Core.Runtime" value="Debug" />
to Config/serilog.config should give a fuller picture.
This is the error just before the lock that comes up locally: No data for media 3976
[InvalidOperationException: No data for media 3976] Umbraco.Web.PublishedCache.NuCache.DataSource.DatabaseDataSource.CreateMediaNodeKit(ContentSourceDto dto, IContentCacheDataSerializer serializer) in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\DataSource\DatabaseDataSource.cs:459 Umbraco.Web.PublishedCache.NuCache.DataSource.<GetAllMediaSources>d__17.MoveNext() in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\DataSource\DatabaseDataSource.cs:327 Umbraco.Web.PublishedCache.NuCache.ContentStore.SetAllFastSortedLocked(IEnumerable
1 kits, Boolean fromDb) in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\ContentStore.cs:704
Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService.LoadMediaFromDatabaseLocked(IScope scope, Boolean onStartup) in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\PublishedSnapshotService.cs:478
Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService.LockAndLoadMedia(Func2 action) in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\PublishedSnapshotService.cs:454 Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService.<EnsureCaches>b__36_0() in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\PublishedSnapshotService.cs:277 System.Threading.LazyInitializer.EnsureInitializedCore(T& target, Boolean& initialized, Object& syncLock, Func
1 valueFactory) +139
Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService.EnsureCaches() in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\PublishedSnapshotService.cs:210
Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService.CreatePublishedSnapshot(String previewToken) in D:\a\1\s\src\Umbraco.Web\PublishedCache\NuCache\PublishedSnapshotService.cs:1109
System.Lazy1.CreateValue() +734
That doesn't sound like fun.
might be worth setting the following in appsettings <add key="Umbraco.Web.PublishedCache.NuCache.Serializer" value="JSON" />
(removing any existing entries that set the value to MsgPack
).
truncating the cmsContentNu table
restarting the app and looking at the entries in cmsContentNu where nodeId = 3976
This is the query that's ran to load all the media content from cmsContentNu.
According to the exception you're getting the result set has null for both EditData and EditDataRaw which is unexpected.
DECLARE @nodeId INT = 3976
SELECT [umbraconode].[id] AS [Id],
[umbraconode].[uniqueid] AS [Key],
[umbraconode].[level] AS [Level],
[umbraconode].[path] AS [Path],
[umbraconode].[sortorder] AS [SortOrder],
[umbraconode].[parentid] AS [ParentId],
[umbraconode].[createdate] AS [CreateDate],
[umbraconode].[nodeuser] AS [CreatorId],
[umbracocontent].[contenttypeid] AS [ContentTypeId],
[umbracocontentversion].[id] AS [VersionId],
[umbracocontentversion].[text] AS [EditName],
[umbracocontentversion].[versiondate] AS [EditVersionDate],
[umbracocontentversion].[userid] AS [EditWriterId],
[nuEdit].[data] AS [EditData],
[nuEdit].[dataraw] AS [EditDataRaw]
FROM [umbraconode]
INNER JOIN [umbracocontent]
ON ( [umbraconode].[id] = [umbracocontent].[nodeid] )
INNER JOIN [umbracocontentversion]
ON ( ( [umbraconode].[id] = [umbracocontentversion].[nodeid] )
AND [umbracocontentversion].[current] = 1 )
LEFT JOIN [cmscontentnu] [nuEdit]
ON ( ( [umbraconode].[id] = [nuEdit].[nodeid] )
AND [nuEdit].[published] = 0 )
WHERE (( ( [umbraconode].[nodeobjecttype] =
'b796f64c-1f99-4ffb-b886-4bf4bc011a9c' )
AND ( [umbraconode].[trashed] = 0 ) ))
AND [umbraconode].[id] = @nodeId
ORDER BY [umbraconode].[level],
[umbraconode].[parentid],
[umbraconode].[sortorder]
If cmContentNu is empty toggling back and forth between MsgPack
and JSON
for appsettings key Umbraco.Web.PublishedCache.NuCache.Serializer
forces the table to be repopulated (there is probably a smarter way to make this happen, but this works)
Ok, will try that. If it doesn't work, I might copy back a db that works, from a different env...
and we can then figure out the lock issue with a db that should work and isn't corrupted
So the cmscontentnu table is empty now, and I've changed the serializer type to JSON, and now it's running through the db as follows: Eventually though it will run into an error and then it will fall over and lock the cache.
I'm now trying with a new DB. One that has a correct cmsContentnu.
Is it unusual to have 1.5GB cache? I'm happy to assign the resources to the nodes to run these instances, but it's going to be impossible if this data keeps growing, and we want to pull up new instances of the application. This is a new system, so v9 is not really an option yet...
Still loading after 30 minutes:
Is it unusual to have 1.5GB cache
NuCache.Content.db has a copy of the current draft and current published version for every content node in your site for all cultures, how does it compare to the result of executing this on your database sp_spaceused [umbracoPropertyData]
it's possible to turn off the NuCache.Content.db file entirely and load straight from your sql server database each boot.
This may result in slower warm boots, but might be worth experimenting to see how it impacts your cold boots when not writing to disk.
This is documented on https://our.umbraco.com/Documentation/Fundamentals/Setup/Server-Setup/Load-Balancing/azure-web-apps-v8#appdomain-synchronization (search for PublishedSnapshotServiceOptions)
yeah, that's almost the same data. I meant is it usual to have so much data in an Umbraco database? In your experience, do other setups compare in startup times?
There's definitely some very large umbraco sites out there in the 50-100k documents range.
My personal experience is all with versions < 8.0 so it's hard for me to say how long it should take for a multilingual 10k node v8 site should take to boot.
This is the sort of thing it would be great if we had telemetry on.
I've added the ignorelocaldb, and interestingly, the admin pod started. But the front-end pods do not. There is very strange behaviour going on. I restarted the deployment, the new pods came up. Then, it all started. I edited some pages, it worked fine. I restarted the deployment again, and then the new pods came up again. Fast. Like under a minute.
but the cache one the new pods is broken, and contains the content from before saving the pages.
I hit rebuild memory cache, and now I'm getting DB locks in the CMS when saving pages. I imagine it will take a while. But what's going on with the front-end pods? This is so bizarre...
So on my local instance, the cache is 171 MB... The Cache rebuild and cold startup of my site is till going after 10 minutes. I think something is wrong... It's crazy to think that 5000 pages would take so long to load into memory.
I have block list editors in the site, would that have an effect, on the startup?
I have block list editors in the site, would that have an effect, on the startup?
I don't believe so.
This topic has gotten a little off topic now and may need breaking down into multiple issues with steps to reproduce for each.
With the nucache disk cache disabled we are not going to encounter NuCache.Content.db being used by multiple processes / app domains which is a shame as I would like very much to know how to reproduce this.
I am turning it back on locally. I'm also trying to connect to the DB using a different umbraco application that works correctly, and loads the cache correctly in under 5 minutes. Still a bit much, but acceptable. Lock timeouts happen in the CMS when I try to edit content, but that's to be expected, as you want to ensure that the cache rebuilds before you start making changes to the db etc... So this is another website, running the same version of Umbraco, on the same database. One takes 3-5 minutes on the 1.5 GB, the other 20
I have a better example, more isolated scenario.
I installed a new version of umbraco, with a bare-bones setup. I connected to the DB.
Connection timeout 120 in connection string.
<httpRuntime requestValidationMode="2.0" enableVersionHeader="false" targetFramework="4.7.2" maxRequestLength="157286400" executionTimeout="10800" fcnMode="Single" />
With this, the CPI maxes out on the DB, each time I try to start the app. This is connecting to the larger DB...
I then get a boot failed, SQL connection timeout.
System.Data.SqlClient.SqlException: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\r\nThe statement has been terminated.\n
Without steps to reproduce, logs, or a bakpak I'm not sure how I can assist here.
Incidentally, we have a shared local DB, many of us use the same DB. WHen I connect to a standalone DB, both large or small, it works fine. I'm not doing any write operations, just waiting for Umbraco to start up. When I connect to a shared DB, as in multiple instances connecting to the DB, it fails to get a SQL connection. I'm just testing this theory.
I have this as amaster-replica elector: Is this sufficient? All front-end instances should be doing is reading from the database, with this?
public class MasterReplicaElector : IUserComposer
{
private readonly ILogger _logger;
public MasterReplicaElector()
{
}
public void Compose(Composition composition)
{
if (MofConfigurationManager.Current.IsFrontEndServer)
{
composition.SetServerRegistrar(new FrontEndReadOnlyServerRegistrar(_logger));
}
else
{
composition.SetServerRegistrar(new MasterServerRegistrar(_logger));
}
}
}
public class MasterServerRegistrar : IServerRegistrar
{
private readonly ILogger _logger;
public MasterServerRegistrar(ILogger logger)
{
this._logger = logger;
}
public IEnumerable<IServerAddress> Registrations
{
get { return Enumerable.Empty<IServerAddress>(); }
}
public ServerRole GetCurrentServerRole()
{
//StylusLogger.CurrentLogger.LogWarning<MasterServerRegistrar>("Getting Scheduling Registrar: {0}", "Master");
return ServerRole.Master;
}
public string GetCurrentServerUmbracoApplicationUrl()
{
// NOTE: If you want to explicitly define the URL that your application is running on,
// this will be used for the server to communicate with itself, you can return the
// custom path here and it needs to be in this format:
// http://www.mysite.com/umbraco
return null;
}
}
public class FrontEndReadOnlyServerRegistrar : IServerRegistrar
{
private readonly ILogger _logger;
public FrontEndReadOnlyServerRegistrar(ILogger logger)
{
this._logger = logger;
}
public IEnumerable<IServerAddress> Registrations
{
get { return Enumerable.Empty<IServerAddress>(); }
}
public ServerRole GetCurrentServerRole()
{
return ServerRole.Replica;
}
public string GetCurrentServerUmbracoApplicationUrl()
{
return null;
}
}
All front-end instances should be doing is reading from the database
frontend instances will still need to write to the database, e.g. ~adding a read lock for distributed locking, or~ registering themselves as a MainDom instance in [dbo].[UmbracoKeyValue] when using SqlMainDomLock.
IServerRegistrar is more for which instance is responsible for running scheduled publishing tasks etc.
Ok, so I see for example an admin pod that I've set up in the UmbracoServer marked as isMaster - false and the front-end pod as ismaster = true.
What defines how Umbraco decides what's a master or not?
Edit- I don't see any of my current pods in the UmbracoServer table, and the cacheinstructions table is full. The two computer names are really old. New pods would have to register themselves as hey are pulled up, and Umbraco/Windows has a bug that the pod names are over 15 characters long, due to the deployment/replicaset/pod naming convention. would this have something to do with it?
What defines how Umbraco decides what's a master or not?
Master server is just a label on a diagram, it's the instance where you allow login to backoffice for editing.
WRT scheduled publishing it's described here but you're explicitly setting the scheduling publisher role with explicit-master-scheduling-server with your class MasterReplicaElector
Edit: The names make more sense in v9 where it is Publisher/Subscriber instead of Master/Replica
Thanks you for your help so far!
I feel like I'm chasing my tail and there is something fundamentally wrong with something. I don't see any entries in the UmbracoServer table for any of the servers. There were a bunch of old server names, but nothing new, so when I deleted those, none of the cache instructions are going through to the replicas
Adding SQLDomLock to the web config basically maxes out the DB CPU while it's trying to load the cache, and it times out unless I set the Connection timeout to some huge number >240...
So I'm wondering - The fact that we're getting DB locks locally is understandable. It's effectively multiple admin instances.
But on the kubernetes pods, there are two issues - one is that none of the apps write their names into the UmbracoServer table, and the other is that none have an UmbracoApplicationUrl in the UmbracoSettings.Config.
If in the master/replica elector I try and get the pods IP address, and set the SQLMainDomLock, would this ensure that we get the correct list of servers in the server table? Assuming this works, then we won't get the issue of all the pods locking the db and none coming up...
I think that when you opt into explict server roles (by registering an implementation of IServerRegistrar
) then the DatabaseServerRegistrarAndMessengerComponent
doesn't create the TouchServer task which updates the umbracoServer table, see here.
It's not required for Umbraco to work out server roles if you set them explicitly.
Can you confirm that with your implementation of IServerRegistrar that only one of the instances returns ServerRole.Master for GetCurrentServerRole
Yes, There's only one instance. During deploy, we only set the admin instance as master in the app configs, and compose based on that...
Ok. So why does CPU Max out when any of the pods try to read the cache? I'm currently getting a CPU maxing out, and
`Boot failed: Umbraco cannot run. See Umbraco's log file for more details.
-> Umbraco.Core.Exceptions.BootFailedException: Boot failed.
-> System.Data.SqlClient.SqlException: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
The statement has been terminated.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
I'm completely at a loss, and There's nothing relevant in the logs. It transpires that the cache rebuild isn't what takes the most amount of time, it's actually connecting to the database, as sometimes it takes very long to actually connect and start doing the work. In my clean install that's what I notices on the local db, that when other people were connected to the DB, it maxed out CPU. So it's the same even with master replica election on the pods, max CPU and boot failed
Are the instances fighting for MainDom status?
If you run the following query you should see one entry per instance of the application e.g. in production with 1 admin instance and 3 frontend instances there should be 4 rows for Umbraco.Core.Runtime.SqlMainDom-{{maindomkey}}
select * from umbracoKeyValue where [key] like 'Umbraco.Core.Runtime%'
there are 252 rows in that table... Should it get cleared out? I image all the pods would have written an id in there at some point, but they don't get cleared out?
Why would it work for another client running on kubernetes? Granted, their cache size is much smaller, but ther eare never any issues there? They are on a lower version number for umbraco
there are 252 rows in that table... Should it get cleared out?
I don't think it matters that there are dead records floating around and hints that the instances likely aren't fighting each other for MainDom status which is good.
Why would it work for another client running on kubernetes
I have no idea, I can't run these apps or view their logs.
HI Paul,
Can i send you a copy of the code? I can also allow access through the firewall if you wouldn't mind taking a look?
We have a few scheduled tasks running in the background, not sure if they would have anything to do with the slow startup, but they're on a delay, so I don't see how.
Thanks for all the help so far!
Gabor
I've just ran a SQL query to see what statement is running when we get into this tangle. Before it times out, the following sql is running against the DB. This times out.
(@0 uniqueidentifier)DELETE FROM cmsContentNu WHERE cmsContentNu.nodeId IN ( SELECT id FROM umbracoNode WHERE umbracoNode.nodeObjectType=@0 )
It takes 8 minutes to run, and after that, the rebuild of the nucache table another 10.
It's interesting to see that it tries to rebuild an already built DB cache table. Why not just use it, if it's already built?
There is nothing in the UmbracoCacheInstruction table. Do you know if the UmbracoCacheInstruction table ever gets cleared out? What would cause a full DB cache rebuild?
Does the above help bring us any closer to the root of the issue?
If it helps, here are some log entries of the process:
Gabo Logs.txt r
It's interesting to see that it tries to rebuild an already built DB cache table. Why not just use it, if it's already built?
The main reason for that to occur is when the configured value for the appsettings key Umbraco.Web.PublishedCache.NuCache.Serializer
doesn't match the value storedfor that same key in the umbracoKeyValue
table in the database which seems likely as we have already mentioned toggling the value in this issue.
If you're talking about local dev where you and other folks are sharing a database this could be compounded by having differing config as it will keep having to be rebuilt depending on who ran the app last.
Oh I see, ok, well we must stop chaning the serializer, and will just leave it with the standard msgpk serializer.
SO what can we do to ensure that a sql timeout doesn't cause the published cache staying in a locked state? I feel like this is a fundamental problem with Umbraco...
I'm just trying to understand how we can fix this. The system either starts after a looong 10 minute wait, or the sql query times out, and then the cache file stays in a locked state. Not using the cache file is not an option, as everything is just super slow.
What can we do to try to figure out what's going on? I can send you a clean install of Umbraco that I used for testing, and grant you access to the QA Db. Connecting to this would be like connecting as a replica pod, so you could see what's going on...
Would that be something you could help with?
SO what can we do to ensure that a sql timeout doesn't cause the published cache staying in a locked state?
Just so we're on the same page here.
nucache.content.db
file is essentially just a dictionary that's written to the filesystem of the app server.The system either starts after a looong 10 minute wait
That sounds good other than the fact that it took a looong time, we know that it's slow to cold boot for large sites and fixing it is in our long term goals.
or the sql query times out
Which query times out?
and then the cache file stays in a locked state.
It's supposed to be locked whilst the app is running.
install of Umbraco that I used for testing, and grant you access to the QA Db. Connecting to this would be like connecting as a replica pod, so you could see what's going on...
I'd prefer a bacpac if possible but before we go down that road it'd be good for you to work out which query is timing out.
When you run locally are you also using SqlMainDomLock? what happens if you instead use the default by removing the appsetting with the key Umbraco.Core.MainDom.Lock
HI Paul,
Ok, I understand.
So With Kubernetes, everything runs in its own container, so effectively a little silo. Nothing else should be accessing the nucache file. One the master instance, the nucache file takes a long time to build, but more often than not, it manages it after 10-15 minutes. I had to set the SQL Connection timeout to 600 currently, otherwise I was getting a SQL timeout error, followed by nucache file locked.
The issue is that the DB CPU maxes out trying to start up, especially if it's a cold start. This causes a timeout of ANY sql statement, though I will run a test now, to see what runs, and for how long.
I've just deployed to the QA environment:
Pod gets pulled up, other instance disposed. SQL starts running: (@0 uniqueidentifier)DELETE FROM cmsContentNu WHERE cmsContentNu.nodeId IN ( SELECT id FROM umbracoNode WHERE umbracoNode.nodeObjectType=@0 )
DB CPU jumps to 100%
Bunch of cache rebuild sql tasks follow, and SQL DTU still 100%. Back-end pod not yet starting, as deploy configured to wait after admin pod, for 10 minutes... 10 minutes have passed - back-end pod begins to run. after 1 minutes:
back-end pod begins deleting the content of the nu table, while the other is starting to build it... CPU still at 100%...
BE pod throws a boot failed exception.
20 minutes pass, still running:
Nothing in the logs about any errors.
Basically, it's impossible to craft a deploy strategy, or even to start the application up when this happens. Previously it wasn't even trying to run the cache rebuild, the same thing happened. Anyway. That's the long story - Can I send you a bakpak somehow? I need to understand if there's a bug we're looking at, if I need to change something in what we're doing, throw more processing power at it, or what to do, as we are two weeks away from going live, and I have no trust in Umbraco being able to handle the current workload, let alone future growth...
It seems to me like the more DTU I throw at the DB, the more likely it is to succeed to pull up one server. But say I want two or three??
Which exact Umbraco version are you using? For example: 9.0.1 - don't just write v9
8.18.3
Bug summary
Umbraco version: 8.18.3 Multilingual site containing about 10000 pages. Nucache DB size: 1.4GB
I'm in a bit of a pickle. We have a site that runs on Kubernetes, in a docker container. This is something we've done before and it has worked. The setup is as follows:
1 CMS Pod, 3 Front-end Umbraco Pods 3 Nuxt applications communicating with the Umbraco Front-end pods.
We have with lower data volumes been all right, but since loading in 10000 pages, containing a bunch of baked HTML content, we cannot start up our sites. Each pod takes well over 10 minutes to start, and then we run into a
The process cannot access the file 'C:\inetpub\wwwroot\site\App_Data\TEMP\NuCache\NuCache.Content.db' because it is being used by another process.
Specifics
The issue occurs locally too. My local configuration is the same as the deployed one, except for these two app settings, but since this is Kube and docker, I don't thing they're toooo relevant. Could be wrong.
We have another site running in Kubernetes, and that works fine.
The only difference between those sites is that this one contains more logic, and that we have an Entity framework instance in this one. It is the published content cache that takes a long time to build though. Here is a dot trace output when the app tries to start up in a smaller data set instance:
We don't use examine on the front-end servers, so WE've also implemented this:
SO I'm at a bit of a loss. It doesn't feel like 10000 pages, plus about 5000 more images should do this to the cache. What are we doing wrong, and what can we do to fix the startup time and reduce the lock times?
One other note to mention: On the CMS, when saving pages, we got all these errors while the other pods were starting up:
We have a number of composers in the system that are reading from the cache, and on save, some also validate and update the database in custom tables, though I doubt that entity framework has too much to do with the cache loading, or lock period timeout exceeded errors... To test, I have disabled these to ensure that they don't interfere, but we get the same issue...
We really need to hand this site over soon, and while everything is looking great, we cannot have more than one instance of the application running, and it doesn't look great that we get all these lock timeouts...
Steps to reproduce
See above. You would need to create a container from the site, configure:
An azure Kubernetes Cluster with no persistent storage, D2V3 Node instance with 14GB memory. Deploy one CMS instance, 3 front-end Instances. DB is Azure SQL Server currently on a premium plan & 250 DTU to rule that out Load in 10000 pages containing block List editors and raw HTML We are using Plugins: Nexu, SkyBrud Url Redirect Manager Umbraco Forms, NewsletterStudio
Azure Blob storage for images
Expected result / actual result
Cache rebuilds quickly and app starts up quickly.