Examine index issues in Umbraco 8.6.0 Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed

MihaiPotcoava commented 4 years ago

We've recently upgraded to Umbraco 8.6.0 since we've experienced Examine index locking issues, described here: https://github.com/umbraco/Umbraco-CMS/issues/5035 The recommendation from Shazwazza was to upgrade to 8.6.0, see his comment: https://github.com/umbraco/Umbraco-CMS/issues/5035#issuecomment-599546750

We've upgraded, followed the instructions from https://our.umbraco.com/documentation/getting-started/setup/server-setup/azure-web-apps

However, once a few hours we're getting Examine lock-related errors (details above). When we touch the web.config everything runs ok for a few hours then the problems reappear.

Umbraco version

I am seeing this issue on Umbraco version: 8.6.0, Examine v1.0.3, Lucene.Net v3.0.3

Reproduction

If you're filing a bug, please describe how to reproduce it. Include as much relevant information as possible, such as:

Bug summary

We're running in Azure, we've recently upgraded from 7 to 8.5.3 then to v8.6.0 With version 7 everything was working ok but we wanted to benefit from the latest version inprovments. With version 8.5.3, the website was working pretty well but we encountered the issues described here: https://github.com/Shazwazza/Examine/issues/161 so we upgraded.

Now all seems to work fine except some Examine-related exceptions described below:

1) When SAVING a document, the following exceptions appear in logs (think that they are related):

{"@t":"2020-04-23T12:45:43.8069949Z","@mt":"App is shutting down so index batch operation is ignored","@l":"Error","SourceContext":"Umbraco.Examine.UmbracoContentIndex","ProcessId":6220,"ProcessName":"w3wp","ThreadId":242,"AppDomainId":67,"AppDomainAppId":"LMW3SVC20377917ROOT","MachineName":"RD00155D0A79A5","Log4NetLevel":"ERROR"}
{"@t":"2020-04-23T12:45:43.8069949Z","@mt":"Exception","@l":"Error","@x":"System.ObjectDisposedException: The CancellationTokenSource has been disposed.\r\n   at System.Threading.CancellationTokenSource.ThrowObjectDisposedException()\r\n   at Examine.LuceneEngine.Providers.LuceneIndex.SafelyProcessQueueItems(Action`1 onComplete) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\LuceneIndex.cs:line 783\r\n   at Examine.LuceneEngine.Providers.LuceneIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\LuceneIndex.cs:line 302\r\n   at Umbraco.Examine.UmbracoContentIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete) in D:\\a\\1\\s\\src\\Umbraco.Examine\\UmbracoContentIndex.cs:line 102\r\n   at Examine.Providers.BaseIndexProvider.IndexItems(IEnumerable`1 values) in C:\\projects\\examine-qvx04\\src\\Examine\\Providers\\BaseIndexProvider.cs:line 76\r\n   at Our.Umbraco.FullTextSearch.Components.PerformCacheTasks.PerformRun()","SourceContext":"Our.Umbraco.FullTextSearch.Components.PerformCacheTasks","ProcessId":6220,"ProcessName":"w3wp","ThreadId":242,"AppDomainId":67,"AppDomainAppId":"LMW3SVC20377917ROOT","MachineName":"RD00155D0A79A5","Log4NetLevel":"ERROR"}

When SEARCHING using top-right search button, the following exception is shown:

Unhandled controller exception occurred for request '{RequestUrl}'
Error: Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed
   at Lucene.Net.Index.IndexReader.EnsureOpen() in d:\\Lucene.Net\\FullRepo\\trunk\\src\\core\\Index\\IndexReader.cs:line 204
   at Lucene.Net.Index.DirectoryReader.GetFieldNames(FieldOption fieldNames) in d:\\Lucene.Net\\FullRepo\\trunk\\src\\core\\Index\\DirectoryReader.cs:line 1055
   at Examine.LuceneEngine.Providers.LuceneSearcher.GetAllIndexedFields() in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\LuceneSearcher.cs:line 101
   at Examine.LuceneEngine.Providers.BaseLuceneSearcher.CreateQuery(String category, BooleanOperation defaultOperation, Analyzer luceneAnalyzer, LuceneSearchOptions searchOptions) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\BaseLuceneSearcher.cs:line 64
   at Examine.LuceneEngine.Providers.BaseLuceneSearcher.CreateQuery(String category, BooleanOperation defaultOperation) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\BaseLuceneSearcher.cs:line 49
   at Umbraco.Web.Search.UmbracoTreeSearcher.ExamineSearch(String query, UmbracoEntityTypes entityType, Int32 pageSize, Int64 pageIndex, Int64& totalFound, String searchFrom, Boolean ignoreUserStartNodes) in D:\\a\\1\\s\\src\\Umbraco.Web\\Search\\UmbracoTreeSearcher.cs:line 128
   at Umbraco.Web.Trees.ContentTreeController.Search(String query, Int32 pageSize, Int64 pageIndex, Int64& totalFound, String searchFrom) in D:\\a\\1\\s\\src\\Umbraco.Web\\Trees\\ContentTreeController.cs:line 331
   at Umbraco.Web.Editors.EntityController.SearchAll(String query) in D:\\a\\1\\s\\src\\Umbraco.Web\\Editors\\EntityController.cs:line 152
   at lambda_method(Closure , Object , Object[] )
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass6_2.<GetExecutor>b__2(Object instance, Object[] methodParameters)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.Execute(Object instance, Object[] arguments)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__1.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Dispatcher.HttpControllerDispatcher.<SendAsync>d__15.MoveNext()
RequestUrl":"https://beta.magicbreaks.co.uk/umbraco/backoffice/UmbracoApi/Entity/SearchAll?query=te"

When OPEN Settings -> Examine management:

Error: The 'ObjectContent`1' type failed to serialize the response body for content type 'application/json; charset=utf-8'.
Exception Details
System.InvalidOperationException: The 'ObjectContent`1' type failed to serialize the response body for content type 'application/json; charset=utf-8'.
Inner Exception
Lucene.Net.Store.AlreadyClosedException: this IndexWriter is closed
at Lucene.Net.Index.IndexWriter.EnsureOpen(Boolean includePendingClose) in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 852
   at Lucene.Net.Index.IndexWriter.get_Directory() in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 2164
   at Umbraco.Examine.LuceneIndexDiagnostics.get_Metadata() in D:\a\1\s\src\Umbraco.Examine\LuceneIndexDiagnostics.cs:line 64
   at Umbraco.Examine.UmbracoExamineIndexDiagnostics.get_Metadata() in D:\a\1\s\src\Umbraco.Examine\UmbracoExamineIndexDiagnostics.cs:line 24
   at Umbraco.Examine.UmbracoExamineIndex.get_Metadata() in D:\a\1\s\src\Umbraco.Examine\UmbracoExamineIndex.cs:line 196
   at Umbraco.Web.Editors.ExamineManagementController.CreateModel(IIndex index) in D:\a\1\s\src\Umbraco.Web\Editors\ExamineManagementController.cs:line 193
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalWriter.cs:line 677
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeValue(JsonWriter writer, Object value, JsonContract valueContract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerProperty) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalWriter.cs:line 179
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalWriter.cs:line 95
   at Newtonsoft.Json.JsonSerializer.SerializeInternal(JsonWriter jsonWriter, Object value, Type objectType) in /_/Src/Newtonsoft.Json/JsonSerializer.cs:line 1149
   at System.Net.Http.Formatting.BaseJsonMediaTypeFormatter.WriteToStream(Type type, Object value, Stream writeStream, Encoding effectiveEncoding)
   at System.Net.Http.Formatting.JsonMediaTypeFormatter.WriteToStream(Type type, Object value, Stream writeStream, Encoding effectiveEncoding)
   at System.Net.Http.Formatting.BaseJsonMediaTypeFormatter.WriteToStream(Type type, Object value, Stream writeStream, HttpContent content)
   at System.Net.Http.Formatting.BaseJsonMediaTypeFormatter.WriteToStreamAsync(Type type, Object value, Stream writeStream, HttpContent content, TransportContext transportContext, CancellationToken cancellationToken)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Umbraco.Web.WebApi.AngularJsonMediaTypeFormatter.<WriteToStreamAsync>d__1.MoveNext() in D:\a\1\s\src\Umbraco.Web\WebApi\AngularJsonMediaTypeFormatter.cs:line 52
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.WebHost.HttpControllerHandler.<WriteBufferedResponseContentAsync>d__22.MoveNext()

Specifics

Exceptions are shown in Umbraco backoffice Umbraco: 8.6.0 Browser: chrome, tried in other browsers Full exceptions above, screenshots: When saving: save-error When searching: search-error When accessing examine settings: examine-settings-error

Steps to reproduce

To repro exception nr 1:

Log in into Umbraco
Go to any document
Click Save and publish
the exception above appears

To repro exception nr 2:

Log in into Umbraco
Go to search (top-right search button)
type anything in the searchbox
the exception above appears

To repro exception nr 3:

Log in into Umbraco
Go to Settings -> Examine management
the exception above appears

Expected result

Exceptions are shown in Umbraco in a popup window

Actual result

There should be no exception

ghost commented 4 years ago

Same here, also with 2 projects. One small one, one big one. The issue only occurs on Azure, on Windows hosting with a similar big project it doesn't have the issue right now.

nul800sebastiaan commented 4 years ago

Have both of you applied the recommended Azure settings as documented here?

https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/azure-web-apps

Note that Umbraco.Core.MainDom.Lock is new for 8.6+. While I don't think that one affects Examine specifically, it is the recommended setting along with both Umbraco.Core.LocalTempStorage and Umbraco.Examine.LuceneDirectoryFactory which do affect Examine directly.

Chris-N1 commented 4 years ago

We are also getting this issue on a newly migrated site (in our case we have upgraded to 8.6.1 from 7.15.4); the examine settings are as described in that article, and we also have custom IServerRegistrars as described here;

https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/flexible-advanced#explicit-master-scheduling-server

To explicitly set the master/replica servers

The issues were so severe that we've had to roll back to version 7 temporarily - it is a large site with a lot of content, but it was working fine in version 7.15.4

Note - we do have some custom indexing going on in the TransformingIndexValues event via a component, but this does not appear to be the source of the issues - removing the registration for all our custom components still resulted in the same issues.

Not sure if this is important, but on the master (back office) server I checked God Mode and Is Main Dom is showing as "False" - it's instead showing as "True" on one of the replica servers.

Should the new Umbraco.Core.MainDom.Lock setting be on all servers or only the master server?

We have a Dev server also on 8.6.1, also in Azure, and this seems to be working fine for the same site...

Chris-N1 commented 4 years ago

I've done some investigation because I hate a mystery; it appears that the last server (in a multi-instance environment) to restart gets the "is main dom" status (I checked using God Mode), regardless of whether it's designated as a replica or master server.

However, only the server with Main Dom status appears to be able to read/write indexes - so if the Main Dom server is the back office, it will work, but front office servers will not be able to use Examine. (I'm not actually sure what "Is Main Dom" designates, but I've verified this by recycling the app pool on our 3 app services and it's clear that only the last service to restart will have the "Is Main Dom" status, and that this will also be the only server that can read/write indexes) - if this happens to be the back office server then publishing will work but searches or anything that requires an index on the front end server(s) will fail.

I'd be interested to know if @TRES-Internet and @MihaiPotcoava are also using multi instance set ups in Azure to confirm this?

Edit - some experimentation with the master/replica set up suggests that in a multi-instance environment the new SqlMainDomLock setting should be set only on the master server - when I configured it like this (without this setting applied on the replica servers) everything appeared to work as expected.

Shazwazza commented 4 years ago

Hi All,

Yes I think you've stumbled onto an issue with load balancing and this new setting. For now:

Only enable <add key="Umbraco.Core.MainDom.Lock" value="SqlMainDomLock" /> on your master server
For your front-end servers, you can use IgnoreLocalDb option listed here https://github.com/umbraco/Umbraco-CMS/issues/5035#issuecomment-566470249

MainDom controls overlapping AppDomains since .NET allows more than one app domain to run concurrently which occurs during shutdown/restart and only a single AppDomain can write to files at once. The default MainDom lock uses system-wide semaphores which don't work on Azure so the SqlMainDomLock uses a db row as a distributed lock but the implementation currently assumes non-load balancing so that will need to be fixed.

I will update the title of this thread to reflect what the problem is.

Shazwazza commented 4 years ago

Though will want to hear from @MihaiPotcoava and others if they are also load balancing.

MihaiPotcoava commented 4 years ago

Hi All,

thanks for looking into this issue, a few additional details about our setup:

1) we followed the instructions from: https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/azure-web-apps so in our web.config we have: `

<add key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory, Examine" />`

2) we have set IgnoreLocalDb = true like per https://github.com/umbraco/Umbraco-CMS/issues/5035#issuecomment-566470249

3) we're NOT running a multi-instance setup/load balancing, it's just a single instance of the website. We, however, use deployment slots for our beta/release/live environments. The deployment slots are never swapped so I don't think this is the issue, from my knowledge they run in different app pools so they shouldn't be affected by each other

4) the errors above appear mostly on the beta environment (I've seen it on the live instance too). On beta we have the app insights disabled (live enabled) and on both environments we have WEBSITE_DISABLE_OVERLAPPED_RECYCLING set to 1

5) not sure if it matters but

we have enabled multiple languages
we're using Articulate 4.1.1 package
we don't have any custom examine indexes, we have just a FullTextSearch implementation on the frontend (with Our.Umbraco.FullTextSearch 0.2.0 package)

I'm not sure if this is a lock issue or the examine indexes get corrupted?

If you need any additional details, or If I can put some logging in place, custom events, etc, please let me know, I'll be happy to help.

Thanks again!

Shazwazza commented 4 years ago

@MihaiPotcoava I'm unsure if Our.Umbraco.FullTextSearch does anything to the indexes so I can't comment on it but you can check out their source code to see if there's anything interesting going on like background tasks, etc... https://github.com/skttl/Our.Umbraco.FullTextSearch

There's a few exceptions/messages here which to me all point to a lot of appdomain restarts occuring (like a restart loop) and/or multiple processes trying to read/write to your umbraco indexes at once.

App is shutting down so index batch operation is ignored - this is because your app is restarting
Lucene.Net.Store.AlreadyClosedException
The CancellationTokenSource has been disposed as listed https://github.com/Shazwazza/Examine/issues/161

The other's are most likely because the indexes are trying to be accessed by something during a shutdown.

Do you have a lot of restarts in your logs (restart loop)? Are you auto-scaling?

You can try TempEnvDirectoryFactory instead of SyncTempEnvDirectoryFactory. Note that you shouldn't need IgnoreLocalDb = true with SqlMainDomLock (on your master server, but it can't hurt)

Shazwazza commented 4 years ago

I can see that Our.Umbraco.FullTextSearch does do background indexing https://github.com/skttl/Our.Umbraco.FullTextSearch/blob/master/src/Our.Umbraco.FullTextSearch/Components/PerformCacheTasks.cs and there isn't any checks in the loop to check if the background task has been canceled so there is a good chance if there are queued items that it will keep trying to index even though the app has been notified of shutting down. Perhaps you can disable this plugin to verify if it has any impact.

MihaiPotcoava commented 4 years ago

@Shazwazza not sure how to check how many restarts I have in logs, on beta we have serilog:minimum-level: Warning. I guess the app restart event is an information. I'll switch to Information the log level and see afterward. However, see below how many "App is shutting down so index batch operation is ignored" I have:

We don't have autoscaling and the app should not restart only when we're deploying (2-3 times/day on beta env). Just a reminder that on app restart/deploy everything works fine for a few hours and then the same issue occurs again.

Summary:

I'll switch the error log level to Information to check the app restarts
I'll remove the FullTextSearch package and try to see if the issue is resolved
If removing FullTextSearch doesn't work I'll try to switch from SyncTempEnvDirectoryFactory to TempEnvDirectoryFactory
I'll leave IgnoreLocalDb = true as it is if it can't hurt.

Thanks for your help, I'll let you know how it goes.

Shazwazza commented 4 years ago

Thanks for the info. Those logs indicate you have some issues with restart loops. When the app is stopping and starting concurrently a lot like that you'll get lots of issues so need to figure out why this is happening. Unfortunately that's not the easiest thing to diagnose. Let us know if you info logs showing app startup/shutdown match those errors.

Chris-N1 commented 4 years ago

Hi @Shazwazza ,

Thanks for the feedback - I can't add my logs here because of client confidentiality but Umbraco support do have them.

I've set it up with the SqlMainDomLock only on the Master server now and all seems well (is there some way the documentation for the new setting can be updated so other people can avoid this?)

We're looking to go live (again) next weekend now - I'm going to get the users to do some testing on the version 8 site this week.

Can you confirm there aren't likely to be any adverse effects with the new setting only on the Master server?

Thanks,

Chris.

Shazwazza commented 4 years ago

@Chris-N1 yep i'm working with @nul800sebastiaan regarding docs so we can get those changed in the interim until we can fix that. The only 'adverse' affects of running IgnoreLocalDb = true on your replica servers is that your site startup times can be slower because it will need to re-build all of the in-memory caches from the DB instead of from the static nucache file. However, since you are load balancing I'd assume it's on more than one server and maybe you are just using horizontal scaling, in which case if you add a site warmup config using application initialization than azure won't actually route anything to that node until it is ready so you won't notice any difference. There's lots of docs and how-tos online about this, one is here https://blog.baslijten.com/warmup-your-application-on-azure-app-service-when-scaling-up-and-swapping-slots-using-application-initialization/, you can find other Umbraco specific ones if you google something like "azure warmup application initialization umbraco", but if you haven't set this up before there's a few 'gotchas' that you'll discover as you read through articles, there's no silver bullet unfortunately but would be great to have some official docs on that if anyone could help write them.

Chris-N1 commented 4 years ago

Thanks Shannon - we did have application initialization set up for the v7 site and I was planning to replicate it in version 8 - I'd be happy to take a look at helping to write up some documents once this migration is out of the way.

We're not actually using scaling/load balancing at all (at the moment); we have multiple instances because we need a site close to China to keep response times relatively low for users there (we're using Traffic Manager to route requests) as it's an important market for the client; because of the size of the site we've had issues when new instances start up on Autoscale as they have to build their own copy of the indexes, so we have turned it off for now until the Examine package for storing indexes in azure blob storage appears for version 8.

Thanks again for the help! :)

OwainJ commented 4 years ago

Just adding to this that we're also seeing this on our project after having updated to 8.6.1 and added the new web configs. We've only updated our dev environment so far, which is hosted on Azure (F1). We also see the issue if we run the project locally whilst connected to our dev db on Azure. If you need any of our logs, please let me know.

Shazwazza commented 4 years ago

@OwainJ there are multiple issues listed here. Can you please be specific about the issue you are having and your configuration? load balancing? etc... btw if multiple apps are accessing the same database, you are in fact load balancing even if its multiple devs.

OwainJ commented 4 years ago

Hi @Shazwazza, The issues that we're seeing in our umbraco log are:

"App is shutting down so index batch operation is ignored"
"Lucene.Net.Store.AlreadyClosedException: this IndexWriter is closed"
"System.ObjectDisposedException: The CancellationTokenSource has been disposed."

The CancellationTokenSource and IndexWriter errors occur when we attempt to save, publish or unpublish content nodes. (although it's not happening all of the time)

Each local instance, that our devs are running on their machines, are set to "master" and they all connect to our dev database that's hosted in azure. We also auto-deploy our development branch to its own web app "F1", which also connects to the same database. I guess having all of those instances set to master probably isn't helping the situation. Although that hasn't caused any issues before, after reading the above, I can see how it would.

Not sure if it's relevant, but our project is also bilingual, each content node has a culture variant.

Shazwazza commented 4 years ago

@OwainJ yes just be warned that multiple devs using the same database is load balancing with multiple masters. Might work sometimes, might not work others since we don't support multiple master servers. In any case, those logs indicate the app is shutting down but something is trying to index after it's been notified to terminate. Do you, like @MihaiPotcoava, have multiple concurrent restarts in your logs? Because if you do then that's a big problem and will need to be resolved. Your app should not be restarting lots of times and definitely not lots of times in very short amount of times. Do you also have Our.Umbraco.FullTextSearch installed? or some other package that might be indexing on a background thread?

OwainJ commented 4 years ago

@Shazwazza Yea I'm getting multiple restarts in my local instance log:

My teammates are also getting the "App is shutting down" logs too.

Here's one of my teammate's logs from yesterday:

No we don't have that package installed, but we do have a custom Azure indexer that's indexing both the Umbraco content and content from external databases. Although to my knowledge, it isn't touching the Umbraco index, as we're hosting our own index on Azure.

Shazwazza commented 4 years ago

@OwainJ I sort of meant on your azure server, you could very well be restarting your site tons on your local machine since everytime you build you will restart. What I'm wondering specifically is if your website (not dev machine) is in a constant restart loop. Perhaps your own dev sites are in a restart loop too, not sure. And although that log entry indicates that, i'm looking for the "info" log that says the app is shutting down. In any case it looks like an index batch operation is running when your local app is restarting. That can be common if you rebuild an index and then restart your site. Hopefully you don't have some script to keep rebuilding your indexes or something.

Chris-N1 commented 4 years ago

Here are some logs from our Master server while the Umbraco 8 site was live - this is just after an app pool recycle and the app seems to think it's shutting down again - we have full information logs on at the moment, although I'll be adjusting that before going live again - and Umbraco support has a full copy of our logs from the back office server from the weekend:

Details on the "App is shutting down" error:

This seemed to happen every time indexes were built on another server with the SqlMainDomLock specified - at this point I'd turned off all of our custom indexing (we were only using the TransformingIndexValues event) .

OwainJ commented 4 years ago

@Shazwazza Ah apologies, yes it also appears to be happening on our azure dev deployment:

The above error logs appeared after I tried to save a content node (I tried 4 times). Even though it says that the "App is shutting down" the site doesn't "feel" like it's restarting if that makes sense? There's no perceivable downtime/longer page loads. There also doesn't seem to be an "info" log about the app shutting down, only the error logs?

I believe our indexer only indexes on save events, unfortunately, my teammate who wrote it is off work today, so I'm not 100% sure.

"App is shutting down" error log details: The "PostSave" error log details:

OwainJ commented 4 years ago

Just to add: even though we're getting a "The CancellationTokenSource has been disposed.", exception when clicking Save on content nodes, the content does seem to actually save properly.

Shazwazza commented 4 years ago

Hi @Chris-N1 thanks,

Just so I'm clear on this, the logs you've posted and this description:

This seemed to happen every time indexes were built on another server with the SqlMainDomLock specified

Is when you had multiple servers with SqlMainDomLock configured prior to using the work around i mentioned above https://github.com/umbraco/Umbraco-CMS/issues/8006#issuecomment-620328022 ? (just making sure there's not another issue)

@OwainJ I feel like this is the same issue, do you have SqlMainDomLock config applied? and to where?

Chris-N1 commented 4 years ago

@Shazwazza

Hi @Chris-N1 thanks,

Just so I'm clear on this, the logs you've posted and this description:

This seemed to happen every time indexes were built on another server with the SqlMainDomLock specified

Is when you had multiple servers with SqlMainDomLock configured prior to using the work around i mentioned above #8006 (comment) ? (just making sure there's not another issue)

@OwainJ I feel like this is the same issue, do you have SqlMainDomLock config applied? and to where?

Yes that's correct, the logs are when we had the SqlMainDomLock setting on all servers.

With SqlMainDomLock specified only on the master server it seems to be working much better.

I did try setting IgnoreLocalDb to True on the slave servers but that seemed to make things worse (possibly because the site is extremely large - there are around 25,000 pages and 85,000 images) - it seems to be behaving itself when SqlMainDomLock is specified only on the backoffice server.

OwainJ commented 4 years ago

@Shazwazza yup, we have all three of the recommended appSettings configs in our sites web.config

We also still have the disable overlapped recycling config, although we can probably safely remove that now thanks to the NuCache locking fix in 8.6 (yay!)

IgnoreLocalDb is also set to true:

EDIT: @Shazwazza I've just heard back from my teammate, our custom azure indexer only indexes on these events:

ContentService.Saved
ContentService.Published
ContentService.Trashed
ContentService.Deleted
MediaService.Saved
MediaService.Trashed
MediaService.Deleted
MemberService.Saved
MemberService.Deleted

MihaiPotcoava commented 4 years ago

Hi All,

we have the same setup as @OwainJ

we have a Dev database hosted in azure
the local dev machines (2-3) are connecting to the dev database during development

If I understand well, this is like a load balancing scenario with multiple master instances which is not working right now with SqlMainDomLock for all instances.

We didn't use load balancing before and anyway I'm not sure the documentation is up-to-date regarding this issue but maybe we can summarize the settings needed in this scenario?

Thanks!

PS: We've removed the FullTextSearch completely and the same issues appear so I don't think this package has any influence in this

Chris-N1 commented 4 years ago

Shannon has updated the documents here:

https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps

The settings we have and which appear to be working are:

Master/Replica Servers explicitly specified as per the docs here:

https://our.umbraco.com/documentation/getting-started/setup/server-setup/load-balancing/flexible-advanced#explicit-master-scheduling-server

(We have explicitly set the back office URL and the umbracoApplicationUrl is explicitly set in the umbracoSettings.config file

SqlMainDomLock is set only on the Master server

IgnoreLocalDb is not being set; I did try setting it to true (on the front office servers only), but the site is very large and when the back office was restarting at the same time as the front end servers this seemed caused problems, presumably because the front end server would be reading the database for a while.

Note - for development I am using a local copy of the db; because the site is fairly mature there aren't many document type updates and currently I'm the only one working on it, so concurrent development isn't a problem.

Shazwazza commented 4 years ago

Thanks @Chris-N1

@OwainJ Since you're devs are all connecting to a single database, this is load balancing. Also note that your are load balancing with multiple master servers which isn't supported. As mentioned, this can work if there's not a lot of concurrency but there can be problems when there is so just be aware of that. In any case, currently you must ensure that SqlMainDomLock is only specified on your true master server, not your dev machines.

And yes, please see updated docs https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps

MihaiPotcoava commented 4 years ago

All clear, thanks, I'm sure our problems were because we were connecting to the same dev database from the development machines without properly setting up a load balancing environment.

I'll let you know if we still have issues but all ok so far.

fbenyah commented 4 years ago

Hi All

I see I am not the only one with this problem that creeps up from time to time. We built a few sites on v8 this year various sizes but the most stable point before 8.6+ for us was 8.3

Our setup for now does not include load balancing and is not multi-lingual. We do have one website which is multi-lingual still in development with an azure db and multiple devs connecting to it locally as well as an azure web app also connecting to it. This particular site does give issues with the lock but that has already been addressed here so won't go into that.

We have followed the docs on our for the azure setup won't get into that as there is enough references to that here.

Our setup is as follows for almost all azure sites key="Umbraco.Core.MainDom.Lock" value="SqlMainDomLock" key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory, Examine"

This setting we have on the staging environment in the azure app settings and not in the web.config.

WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1

Also this setting is not in prod. The reason for this is because by the time we were ready for prod 8.6 was available so we did an upgrade and it seemed all was well after following the docs. I say seemed because a few hours later we saw locks a lot more frequently even with sql main dom option set. On this project we used umbraco mostly as a content repository so I could just go back to basics install a clean v8.6 make a db copy and point the new clean 8.6 to our db which was recently upgraded. This solved the problem for us with just setting

Umbraco.Core.LocalTempStorage, Umbraco.Examine.LuceneDirectoryFactory and Umbraco.Core.MainDom.Lock

as per the docs on our.

Our locks usually occured during save, save and publish, bulk operations on content. It was almost impossible to update any existing document types. You could if you gave a 3 to 5 minute window between saves and hooking into any of the umbraco events to do anything will result in a lock. Worst case scenario restart the app after waiting a while will get things going for a bit. Example events: ContentService.Published, ContentService.Trashed etc

I was busy investigating the below error this morning when I stumbled on this thread

I went to look in the logs and I see we also have the application restarting quiet a few times so will need to investigate that.

I know a while back umbraco used to go to sleep if there were long periods of inactivity on this particular site that happens a lot as it is a content hub only. Do you guys know if that is still the case with inactivity?

Thanks

Chris-N1 commented 4 years ago

@fbenyah

I think with the "going to sleep" thing the solution is to make sure your web app is set to "always on", but this is on by default I believe (this is a test system that I set up so it's not turned on):

With regard to what you've seen above with locks when there are a lot of updates - we also experienced a lot of index locks this morning on our Master server (although not getting the "app is shutting down" messages).

We have processes that run overnight that update some documents automatically by importing XML files, and this would happen every few seconds, so it seems that this might be an issue with index locking - I'm going to do some more investigation today on this.

One of the imports only saves the document rather than publishing it - the other one publishes as well and goes much faster as the data is far simpler.

fbenyah commented 4 years ago

@Chris-N1 Thanks will check on azure to see that the setting is indeed there. I didn't initially setup the project so I didn't actually check it but good point.

We also have documents being updated by a couple processes though in our case it happens as needed as all the content is real time.

When the payload hits umbraco the data determines whether it is just a save or a save and publish after creating the document. Luckily for us we have seen very few locks if any since moving to 8.6.0 then eventually 8.6.1 and this might just be because we haven't looked at scaling out as yet.

I will also do some investigation into why we have so many restarts and share here might become helpful.

EDIT @Chris-N1 Yep turns out that was off on the problem sites. Thanks for that. Will monitor it a bit more and see how that goes

Shazwazza commented 4 years ago

There is so much information in this thread that I think much of it is getting lost. I'll recap some key things:

The main issue is using SqlMainDomLock in a load balancing scenario which should work but it doesn't. The fix is in the upcoming 8.6.2 release. NOTE: You are load balancing anytime there is more than one process accessing the same database at once, even if this is just your developers.
Azure/Load balancing docs have been updated to cater for this known issue: https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps
WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1 is required without SqlMainDomLock
The App is shutting down so index batch operation is ignore log does not actually indicate the app is shutting down, this is due to the SqlMainDomLock bug. This log indicates that Examine is shutting down because it was told to by the current IMainDom... this is the bug we are talking about here. You will need to check your info logs to see the actual shutdown and startup logs to know when your app domain is actually shutting down and starting up. With the SqlMainDomLock fix, then the App is shutting down so index batch operation is ignore log will only show up when the site is actually shutting down.

afinn-tech commented 4 years ago

@Shazwazza Any temp fix for this while we wait for 8.6.2 , we are also experiencing same issues above? I do have a question on we have public scale able web app in azure for a dev env , but only a single instance , so should the Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory or Examine.LuceneEngine.Directories.TempEnvDirectoryFactory ?

Shazwazza commented 4 years ago

@afinn-tech This is all covered in the docs https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps

idontbyte commented 4 years ago

We are also experiencing this issue as a result of scheduled restarts to multiple systems in non-load-balanced web apps. Config as follows: Version: 8.6.1 Recommended Azure Configuration followed. We are using SqlMainDomLock and we have disabled overlapped recycling (though I'm not convinced that Azure is respecting this application setting)

fbenyah commented 4 years ago

That is interesting that you report this. We just had 2 of our sites in azure which have been fine since 8.6.1 also having this issue resurface about 45 minutes ago. Different azure subscriptions, also 8.6.1, config followed. Have been fine since my last comment here 20 days ago.

Maybe something has changed in azure environment?

idontbyte commented 4 years ago

Exact same timing as us, interesting !!

Chris-N1 commented 4 years ago

We've also had this reoccur this morning, despite nothing changing with our configuration - so it sounds like it could well be linked to something that's change in Azure. In our case recycling the app pool seems to have fixed it, at least for now....

fbenyah commented 4 years ago

Yeah we did the same stopped app service for 5 minutes and restarted it and all seems to be back to normal for now. So I think since we are all starting to experience this again it can only mean a change on azure

Rizen-Kris commented 4 years ago

Same Here

idontbyte commented 4 years ago

Had the same issue this morning, anyone else?

fbenyah commented 4 years ago

I have just resorted to making sure I restart the app pool early in the morning and monitoring for now

afinn-tech commented 4 years ago

same here , had to restart

afinn-tech commented 4 years ago

could it be that all settings are not required : key="Umbraco.Core.MainDom.Lock" value="SqlMainDomLock" key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory, Examine" And IgnoreLocalDb is true also WEBSITE_DISABLE_OVERLAPPED_RECYCLING
could we have too many settings set ? ( although i dont have WEBSITE_DISABLE_OVERLAPPED_RECYCLING set)

teeto commented 4 years ago

We have the same issue. A normal azure web app running on S2 level, not load balanced. Umbraco v8.6.1. Same app for public access and backoffice. This is my web.config:

<add key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" />
    <add key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.TempEnvDirectoryFactory, Examine" />
    <add key="Umbraco.Core.MainDom.Lock" value="SqlMainDomLock" />  
    <add key="WEBSITE_DISABLE_OVERLAPPED_RECYCLING" value="1" />

And this is the logged exception:

System.AggregateException: Exceptions were thrown by listed actions. ---> System.ObjectDisposedException: The CancellationTokenSource has been disposed.
   at System.Threading.CancellationTokenSource.ThrowObjectDisposedException()
   at Examine.LuceneEngine.Providers.LuceneIndex.SafelyProcessQueueItems(Action`1 onComplete)
   at Examine.LuceneEngine.Providers.LuceneIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete)
   at Umbraco.Examine.UmbracoContentIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete)
   at Examine.Providers.BaseIndexProvider.IndexItems(IEnumerable`1 values)
   at Umbraco.Web.Search.ExamineComponent.DeferedReIndexForContent.Execute(ExamineComponent examineComponent, IContent content, Boolean isPublished)
   at Umbraco.Web.Search.ExamineComponent.DeferedReIndexForContent.Execute()
   at Umbraco.Web.Search.ExamineComponent.DeferedActions.Execute()
   at Umbraco.Web.Search.ExamineComponent.DeferedActions.<>c.<Get>b__1_1(Boolean completed, DeferedActions actions)
   at Umbraco.Core.Scoping.ScopeContext.EnlistedObject`1.Execute(Boolean completed)
   at Umbraco.Core.Scoping.ScopeContext.ScopeExit(Boolean completed)
   --- End of inner exception stack trace ---

Chris-N1 commented 4 years ago

@teeto - you can't set the WEBSITE_DISABLE_OVERLAPPED_RECYCLING in the web.config - it has to be set in the azure app service settings.

We haven't had any further issues in the back office since recycling the app pool.

teeto commented 4 years ago

Thanks @Chris-N1 , but i am not sure if in my situation with SqlMainDomLock set and no load balancing it is needed WEBSITE_DISABLE_OVERLAPPED_RECYCLING setting...

Shazwazza commented 4 years ago

WEBSITE_DISABLE_OVERLAPPED_RECYCLING doesn't matter about load balancing or not

There's a fix coming in 8.6.2 and then all that is required is what is in the docs https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/azure-web-apps#recommended-configuration

for further information see https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps

In case you didn't know, Lucene files are stored in the 'local fast' drive on Azure websites because they must be stored there because Lucene does not work when files are stored on a network drive which is how Azure websites stores its files. Since Azure is out of our control and they have been known to change settings and all sorts of things, i suspect since a lot of you noticed a change all at once that perhaps the 'local fast drive' got somehow disconnected or reset on the web workers that were running your sites. This is just a guess but seems plausible.

umbraco / Umbraco-CMS