Open stevetemple opened 2 years ago
I'm seeing the exact same problem on a site, same version of umbraco, 8.17.1 and 85,000+ nodes. I've had to resort to the same fix of increasing executionTimeout;
<httpRuntime executionTimeout="300" />
Hi @stevetemple,
Thank you for reporting this, and sorry for the late response.
We are currently investigating options to mend or replace nucache. It's on the roadmap. Currenly I can't say anything as for a timeline, though.
While we prepare for that, every last bit of input from people experiencing issues with large sites is very much appreciated. So... a few questions come to mind 😄
Hi @kjac
Thanks for the response,
I suspect there is a possible way of batching some of this stuff to prevent timeouts, something I was going to look at but haven't had the time in the last few months.
Just to chime in on this, we've also got a v8 site with 200,000+ nodes that occasionally gets this issue too. It happened when the application pool recycles on a schedule overnight, but leaves the site in a state where every request ends with a System.IO.IOException: The process cannot access the file '{TEMP}\NuCache\NuCache.Content.db' because it is being used by another process
.
So far our only solution has been to extend the exectionTimeout.
One thing that we've found that helps is running ALTER INDEX PK_cmsContentNu ON cmsContentNu REBUILD
regularly. This index gets fragmented and slow pretty quickly on large sites.
Hi @stevetemple we are facing some issues with caching and was wondering if this is something you still need to do on v10?
@osigas I believe the issue you are having is unrelated to the one reported by @stevetemple.
To minimize downtime, IIS starts a new instance of the web application before the old one is completely stopped. That exception occurs because the new application tries to use NuCache.Content.db while the old application is still using it.
You can avoid those locks by disallowing overlapping rotation in IIS. You can achieve that via the UI or with the following powershell command:
Set-WebConfiguration /system.applicationHost/applicationPools/applicationPoolDefaults/recycling/@disallowOverlappingRotation -value "true"
Keep in mind that this might cause more downtime.
You can also disable the IIS periodic restarts (by default it restarts every 29 hours) via the UI or by running the following powershell command:
Set-WebConfiguration /system.applicationHost/applicationPools/applicationPoolDefaults/recycling/periodicRestart/@time -value "00:00:00"
Which exact Umbraco version are you using?
8.17.1
Bug summary
When cold booting a site with 100,000+ nodes I'm getting content.nucache locked errors.
If the content.nucache file is present then it starts up fine, it's the initial generation of the site.
I can reproduce this by pointing an empty v8.17.1 site (just umbraco cms nuget package installed) at the database as well as with the heavily customised site.
This leaves the site in a failed state and it won't then recover.
Specifics
I get the following logs (tweaked to make readable):
The 2nd log line is actually inside some NPoco db operations, I truncated the very long stack trace,
What I believe is happening is that the connection that starts the site hits the execution timeout, this then redirects the user which generates a ThreadAbortException in the middle of a DB operation, I think this is at some point closing the DB connection which is being used on the thread that is populating the content.nucache db.
We then have it reporting as Completed (it's not) and then a Panic as the connection is closed and it leaves the nucache db files locked. Which then stops the site loading.
If I increase executionTimeout to something very large (I believe the default is 110 seconds) then this doesn't happen but does leave my site with very large executionTimeouts which isn't ideal.
Steps to reproduce
Have a site with very large number of nodes 100,000+ and delete nucache and start it from cold boot.
Expected result / actual result
I think the initial connection timing out shouldn't close the connection and leave the site in a failed state because nucache is now locked.
Ideally it could timeout but the building continues in a background thread and site comes up once that is completed.