umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.42k stars 2.67k forks source link

Examine index issues in Umbraco 8.6.0 Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed #8006

Closed MihaiPotcoava closed 4 years ago

MihaiPotcoava commented 4 years ago

We've recently upgraded to Umbraco 8.6.0 since we've experienced Examine index locking issues, described here: https://github.com/umbraco/Umbraco-CMS/issues/5035 The recommendation from Shazwazza was to upgrade to 8.6.0, see his comment: https://github.com/umbraco/Umbraco-CMS/issues/5035#issuecomment-599546750

We've upgraded, followed the instructions from https://our.umbraco.com/documentation/getting-started/setup/server-setup/azure-web-apps

However, once a few hours we're getting Examine lock-related errors (details above). When we touch the web.config everything runs ok for a few hours then the problems reappear.

Umbraco version

I am seeing this issue on Umbraco version: 8.6.0, Examine v1.0.3, Lucene.Net v3.0.3

Reproduction

If you're filing a bug, please describe how to reproduce it. Include as much relevant information as possible, such as:

Bug summary

We're running in Azure, we've recently upgraded from 7 to 8.5.3 then to v8.6.0 With version 7 everything was working ok but we wanted to benefit from the latest version inprovments. With version 8.5.3, the website was working pretty well but we encountered the issues described here: https://github.com/Shazwazza/Examine/issues/161 so we upgraded.

Now all seems to work fine except some Examine-related exceptions described below:

1) When SAVING a document, the following exceptions appear in logs (think that they are related):


{"@t":"2020-04-23T12:45:43.8069949Z","@mt":"App is shutting down so index batch operation is ignored","@l":"Error","SourceContext":"Umbraco.Examine.UmbracoContentIndex","ProcessId":6220,"ProcessName":"w3wp","ThreadId":242,"AppDomainId":67,"AppDomainAppId":"LMW3SVC20377917ROOT","MachineName":"RD00155D0A79A5","Log4NetLevel":"ERROR"}
{"@t":"2020-04-23T12:45:43.8069949Z","@mt":"Exception","@l":"Error","@x":"System.ObjectDisposedException: The CancellationTokenSource has been disposed.\r\n   at System.Threading.CancellationTokenSource.ThrowObjectDisposedException()\r\n   at Examine.LuceneEngine.Providers.LuceneIndex.SafelyProcessQueueItems(Action`1 onComplete) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\LuceneIndex.cs:line 783\r\n   at Examine.LuceneEngine.Providers.LuceneIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\LuceneIndex.cs:line 302\r\n   at Umbraco.Examine.UmbracoContentIndex.PerformIndexItems(IEnumerable`1 values, Action`1 onComplete) in D:\\a\\1\\s\\src\\Umbraco.Examine\\UmbracoContentIndex.cs:line 102\r\n   at Examine.Providers.BaseIndexProvider.IndexItems(IEnumerable`1 values) in C:\\projects\\examine-qvx04\\src\\Examine\\Providers\\BaseIndexProvider.cs:line 76\r\n   at Our.Umbraco.FullTextSearch.Components.PerformCacheTasks.PerformRun()","SourceContext":"Our.Umbraco.FullTextSearch.Components.PerformCacheTasks","ProcessId":6220,"ProcessName":"w3wp","ThreadId":242,"AppDomainId":67,"AppDomainAppId":"LMW3SVC20377917ROOT","MachineName":"RD00155D0A79A5","Log4NetLevel":"ERROR"}

When SEARCHING using top-right search button, the following exception is shown:


Unhandled controller exception occurred for request '{RequestUrl}'
Error: Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed
   at Lucene.Net.Index.IndexReader.EnsureOpen() in d:\\Lucene.Net\\FullRepo\\trunk\\src\\core\\Index\\IndexReader.cs:line 204
   at Lucene.Net.Index.DirectoryReader.GetFieldNames(FieldOption fieldNames) in d:\\Lucene.Net\\FullRepo\\trunk\\src\\core\\Index\\DirectoryReader.cs:line 1055
   at Examine.LuceneEngine.Providers.LuceneSearcher.GetAllIndexedFields() in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\LuceneSearcher.cs:line 101
   at Examine.LuceneEngine.Providers.BaseLuceneSearcher.CreateQuery(String category, BooleanOperation defaultOperation, Analyzer luceneAnalyzer, LuceneSearchOptions searchOptions) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\BaseLuceneSearcher.cs:line 64
   at Examine.LuceneEngine.Providers.BaseLuceneSearcher.CreateQuery(String category, BooleanOperation defaultOperation) in C:\\projects\\examine-qvx04\\src\\Examine\\LuceneEngine\\Providers\\BaseLuceneSearcher.cs:line 49
   at Umbraco.Web.Search.UmbracoTreeSearcher.ExamineSearch(String query, UmbracoEntityTypes entityType, Int32 pageSize, Int64 pageIndex, Int64& totalFound, String searchFrom, Boolean ignoreUserStartNodes) in D:\\a\\1\\s\\src\\Umbraco.Web\\Search\\UmbracoTreeSearcher.cs:line 128
   at Umbraco.Web.Trees.ContentTreeController.Search(String query, Int32 pageSize, Int64 pageIndex, Int64& totalFound, String searchFrom) in D:\\a\\1\\s\\src\\Umbraco.Web\\Trees\\ContentTreeController.cs:line 331
   at Umbraco.Web.Editors.EntityController.SearchAll(String query) in D:\\a\\1\\s\\src\\Umbraco.Web\\Editors\\EntityController.cs:line 152
   at lambda_method(Closure , Object , Object[] )
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.<>c__DisplayClass6_2.<GetExecutor>b__2(Object instance, Object[] methodParameters)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ActionExecutor.Execute(Object instance, Object[] arguments)
   at System.Web.Http.Controllers.ReflectedHttpActionDescriptor.ExecuteAsync(HttpControllerContext controllerContext, IDictionary`2 arguments, CancellationToken cancellationToken)
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__1.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__5.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__6.MoveNext()
   --- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.Dispatcher.HttpControllerDispatcher.<SendAsync>d__15.MoveNext()
RequestUrl":"https://beta.magicbreaks.co.uk/umbraco/backoffice/UmbracoApi/Entity/SearchAll?query=te"

When OPEN Settings -> Examine management:


Error: The 'ObjectContent`1' type failed to serialize the response body for content type 'application/json; charset=utf-8'.
Exception Details
System.InvalidOperationException: The 'ObjectContent`1' type failed to serialize the response body for content type 'application/json; charset=utf-8'.
Inner Exception
Lucene.Net.Store.AlreadyClosedException: this IndexWriter is closed
at Lucene.Net.Index.IndexWriter.EnsureOpen(Boolean includePendingClose) in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 852
   at Lucene.Net.Index.IndexWriter.get_Directory() in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 2164
   at Umbraco.Examine.LuceneIndexDiagnostics.get_Metadata() in D:\a\1\s\src\Umbraco.Examine\LuceneIndexDiagnostics.cs:line 64
   at Umbraco.Examine.UmbracoExamineIndexDiagnostics.get_Metadata() in D:\a\1\s\src\Umbraco.Examine\UmbracoExamineIndexDiagnostics.cs:line 24
   at Umbraco.Examine.UmbracoExamineIndex.get_Metadata() in D:\a\1\s\src\Umbraco.Examine\UmbracoExamineIndex.cs:line 196
   at Umbraco.Web.Editors.ExamineManagementController.CreateModel(IIndex index) in D:\a\1\s\src\Umbraco.Web\Editors\ExamineManagementController.cs:line 193
   at System.Linq.Enumerable.WhereSelectEnumerableIterator`2.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeList(JsonWriter writer, IEnumerable values, JsonArrayContract contract, JsonProperty member, JsonContainerContract collectionContract, JsonProperty containerProperty) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalWriter.cs:line 677
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.SerializeValue(JsonWriter writer, Object value, JsonContract valueContract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerProperty) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalWriter.cs:line 179
   at Newtonsoft.Json.Serialization.JsonSerializerInternalWriter.Serialize(JsonWriter jsonWriter, Object value, Type objectType) in /_/Src/Newtonsoft.Json/Serialization/JsonSerializerInternalWriter.cs:line 95
   at Newtonsoft.Json.JsonSerializer.SerializeInternal(JsonWriter jsonWriter, Object value, Type objectType) in /_/Src/Newtonsoft.Json/JsonSerializer.cs:line 1149
   at System.Net.Http.Formatting.BaseJsonMediaTypeFormatter.WriteToStream(Type type, Object value, Stream writeStream, Encoding effectiveEncoding)
   at System.Net.Http.Formatting.JsonMediaTypeFormatter.WriteToStream(Type type, Object value, Stream writeStream, Encoding effectiveEncoding)
   at System.Net.Http.Formatting.BaseJsonMediaTypeFormatter.WriteToStream(Type type, Object value, Stream writeStream, HttpContent content)
   at System.Net.Http.Formatting.BaseJsonMediaTypeFormatter.WriteToStreamAsync(Type type, Object value, Stream writeStream, HttpContent content, TransportContext transportContext, CancellationToken cancellationToken)
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Umbraco.Web.WebApi.AngularJsonMediaTypeFormatter.<WriteToStreamAsync>d__1.MoveNext() in D:\a\1\s\src\Umbraco.Web\WebApi\AngularJsonMediaTypeFormatter.cs:line 52
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at System.Web.Http.WebHost.HttpControllerHandler.<WriteBufferedResponseContentAsync>d__22.MoveNext()

Specifics

Exceptions are shown in Umbraco backoffice Umbraco: 8.6.0 Browser: chrome, tried in other browsers Full exceptions above, screenshots: When saving: save-error When searching: search-error When accessing examine settings: examine-settings-error

Steps to reproduce

To repro exception nr 1:

  1. Log in into Umbraco
  2. Go to any document
  3. Click Save and publish
  4. the exception above appears

To repro exception nr 2:

  1. Log in into Umbraco
  2. Go to search (top-right search button)
  3. type anything in the searchbox
  4. the exception above appears

To repro exception nr 3:

  1. Log in into Umbraco
  2. Go to Settings -> Examine management
  3. the exception above appears

Expected result

Exceptions are shown in Umbraco in a popup window

Actual result

There should be no exception

teeto commented 4 years ago

So we must always set WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1 on azure web apps?

vaags commented 4 years ago

Am I correct in assuming that, for now, you have to set WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1 - even on version 8.6.1?

We just had a site running 8.6.1 go down (all app settings as described in the docs).

Shazwazza commented 4 years ago

It cannot hurt to have WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1. If you are not using SqlMainDom then this is absolutely required. If you are it shouldn't be required but we have a bug where you cannot use SqlMainDom when you are load balancing (as per docs https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps)

vaags commented 4 years ago

Thanks for the speedy reply :)

We are not load balancing, and have the SqlMainDom setting.

The error we got was this:

Umbraco.Core.Exceptions.BootFailedException: Boot failed.

System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\Internal\write.lock' is denied.

at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.File.InternalDelete(String path, Boolean checkHost)
   at System.IO.File.Delete(String path)
   at Lucene.Net.Store.SimpleFSLock.Release() in d:\Lucene.Net\FullRepo\trunk\src\core\Store\SimpleFSLockFactory.cs:line 203
   at Examine.LuceneEngine.Directories.MultiIndexLock.Release() in C:\projects\examine-qvx04\src\Examine\LuceneEngine\Directories\MultiIndexLock.cs:line 68
   at Lucene.Net.Index.IndexWriter.Unlock(Directory directory) in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 5776
   at Umbraco.Examine.ExamineExtensions.ConfigureLuceneIndexes(IExamineManager examineManager, ILogger logger, Boolean disableExamineIndexing) in D:\a\1\s\src\Umbraco.Examine\ExamineExtensions.cs:line 146
   at Umbraco.Examine.ExamineExtensions.&lt;&gt;c__DisplayClass4_0.&lt;ConfigureIndexes&gt;b__0() in D:\a\1\s\src\Umbraco.Examine\ExamineExtensions.cs:line 50
   at System.Threading.LazyInitializer.EnsureInitializedCore[T](T&amp; target, Boolean&amp; initialized, Object&amp; syncLock, Func`1 valueFactory)
   at Umbraco.Examine.ExamineExtensions.ConfigureIndexes(IExamineManager examineManager, IMainDom mainDom, ILogger logger) in D:\a\1\s\src\Umbraco.Examine\ExamineExtensions.cs:line 44
   at Umbraco.Web.Search.ExamineFinalComponent.Initialize() in D:\a\1\s\src\Umbraco.Web\Search\ExamineFinalComponent.cs:line 32
   at Umbraco.Core.Composing.ComponentCollection.Initialize() in D:\a\1\s\src\Umbraco.Core\Composing\ComponentCollection.cs:line 32
   at Umbraco.Core.Runtime.CoreRuntime.Boot(IRegister register, DisposableTimer timer) in D:\a\1\s\src\Umbraco.Core\Runtime\CoreRuntime.cs:line 188
   at Umbraco.Core.Exceptions.BootFailedException.Rethrow(BootFailedException bootFailedException) in D:\a\1\s\src\Umbraco.Core\Exceptions\BootFailedException.cs:line 80
   at Umbraco.Web.UmbracoInjectedModule.&lt;&gt;c.&lt;Init&gt;b__18_0(Object sender, EventArgs args) in D:\a\1\s\src\Umbraco.Web\UmbracoInjectedModule.cs:line 369
   at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously)
Shazwazza commented 4 years ago

@vaags did you follow the docs, i don't know what the rest of your settings are: https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/azure-web-apps#recommended-configuration. That path is not the local fast drive, that's the main drive. When SyncTempEnvDirectoryFactory is there it will write to both locations. I don't know why this location says it's locked, it would only be the case if multiple processes/appdomains are using it, or maybe it's now 'stuck' if a process terminated and didn't release the windows lock. You can try the TempEnvDirectoryFactory instead, please read docs about the difference https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/file-system-replication#examine-directory-factory-options

JJ-DOOM commented 4 years ago

@Shazwazza thanks for an update. So if I understand correctly there are two separate issues here: 1) The bug related to SqlMainDom with "load balancing" 2) The Lucene index becoming locked or corrupted due to Azure's moving/changing stuff about

In our case we have two separate cases that sort of contradict themselves, both running 8.6 with recommended settings as per the (new) docs. Recently we got the case where one of the sites refused to restart (the common fix for the lucene index dying). After some debugging, it seemed the staging instance (sharing db with prod) somehow was blocking production's restart. Stopping the slot immediately freed up the reboot to continue. This I assume would be bug 1), although I don't quite understand why staging (receiving little to no traffic) would ever cause the db lock as it was running nicely already. As for dev/test envs in the same case, both with their own dbs, one had a working lock file, the other did not (with practically 0 interaction). I assume this to be bug 2).

The weird part, is that in the other case, we have the exact same setup, but absolutely no issues. So, I understand that its difficult since Azure is out of your control, - it is however an extremely common hosting platform. These issue affect a lot of solutions and people. We need to be able to host Umbraco on Azure, its that simple (in customers' eyes its even simpler..). I saw your ExamineX idea and I guess that makes sense solving that part, but how much time remaining? Umbraco solutions have been unstable on Azure for years now.

vaags commented 4 years ago

Thanks for the helpful suggestions, @Shazwazza

Our AppSettings should be exactly as described in the docs:

<add key="Umbraco.Core.MainDom.Lock" value="SqlMainDomLock" />
<add key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" />
<add key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory, Examine" />

For now I've resorted to setting WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1, but will investigate the TempEnvDirectoryFactory setting you mentioned.

Shazwazza commented 4 years ago

@JJ-DOOM yes i know everyone hosts on Azure and Umbraco does work there, thousands of people host with without any issues. Generally speaking most issues with this stuff come down to settings. Unfortunately with sometimes hidden changes in Azure we have to keep creating new work arounds. Lucene unfortunately requires quite a lot of work arounds but it can work.

For you if you are 'sharing a db' with more than one process regardless of if its a slot swap you are load balancing. Slot swapping does not happen synchronously, it will have multiple processes and appdomains trying to read write to the same resources at once while it does it's thing. This is bug 1 you mention which is probably a cause for your issues. This is because we have something called MainDom which synchronizes access to file based things like Lucene and the cache files so that multiple appdomains/processes do not try writing at the same time to locked resources.

There's other threads just as long as this one about these similar issues too but to recap the MainDom problem - Azure doesn't respect system wide semaphores which is how the default MainDom works and we were the only ones to discover this and get confirmation from the Azure team that this doesn't work which is why we have to invent new ways of working around Azure's secrets which are undocumented and can only be discovered by trial and error.

Shazwazza commented 4 years ago

@JJ-DOOM PS... ExamineX is pretty much ready you can ping send an email to software @ sdkits.com if you want to test it out. (it is a licensed product)

vaags commented 4 years ago

@Shazwazza About your comment about a possible "stuck" process that perhaps did not release the lock: I found another error in the logs. This one occured before the boot failure:

System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
   at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
   at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
   at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
   at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
   at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
   at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at System.Data.SqlClient.SqlInternalTransaction.Commit()
   at System.Data.SqlClient.SqlTransaction.Commit()
   at StackExchange.Profiling.Data.ProfiledDbTransaction.Commit() in C:\projects\dotnet\src\MiniProfiler.Shared\Data\ProfiledDbTransaction.cs:line 45
   at NPoco.Database.CompleteTransaction()
   at Umbraco.Core.Scoping.Scope.DisposeLastScope() in D:\a\1\s\src\Umbraco.Core\Scoping\Scope.cs:line 388
   at Umbraco.Core.Scoping.Scope.Dispose() in D:\a\1\s\src\Umbraco.Core\Scoping\Scope.cs:line 363
   at Umbraco.Core.Services.Implement.ServerRegistrationService.TouchServer(String serverAddress, String serverIdentity, TimeSpan staleTimeout) in D:\a\1\s\src\Umbraco.Core\Services\Implement\ServerRegistrationService.cs:line 89
   at Umbraco.Web.Compose.DatabaseServerRegistrarAndMessengerComponent.TouchServerTask.PerformRun() in D:\a\1\s\src\Umbraco.Web\Compose\DatabaseServerRegistrarAndMessengerComponent.cs:line 259

I don't know if this is related in any way.

teeto commented 4 years ago

@Shazwazza i try to follow the docs, but i am not sure what to do in our situtation:

We have the production site and another slot for dev. And on dev slot we have the connection string set to prod database. On our dev localhost umbraco sometimes we set that prod connection string too.

So following your comments that is the same as load balancing, isnt it? If i understand, they dont access the same examine folder, but may be they can lock the prod database?

If it is considered load balancing, i am not sure about documentation here: https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps When it tells about two sites one for public and one for backoffice. We have the same site for both.

In our case i would like to know the configuration needed for the production site and the configuration for the dev slot and for the localhost dev machines. Am i correct asuming our prod slot is the master and the other are the "Replica"?

Shazwazza commented 4 years ago

@teeto In the future, these are great questions for the forum: our.umbraco.org.

For now, if you have more than one process accessing your database at once your are load balancing. There are a lot of things happening with load balancing, I cannot explain it all here but it is explained in the docs and there is a lot of information there to learn. Having your development machines connected to your prod site is not a supported practice (and also will not work with the current SqlMainDom bug if you have that configured on your dev machines). This means there are multiple master services. This could work for some cases but it's not 'safe' so be warned about that. However because each of your devs are their own machines they maintain their own lucene/cache files (this is part of how load balancing works) so there's nothing to configure on your dev machines. For your azure install, if you are slot swapping, due to the current SqlMainDom bug, you cannot use that setting. For now you absolutely must use WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1 or the work around described here previously in this thread https://github.com/umbraco/Umbraco-CMS/issues/8006#issuecomment-620328022 and you need to configure the normal Azure settings https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/azure-web-apps#recommended-configuration

This thread has turned into a forum/support thread but the underlying issues have been solved in:

Closing this issue now.