Closed MihaiPotcoava closed 4 years ago
So we must always set WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1 on azure web apps?
Am I correct in assuming that, for now, you have to set WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1
- even on version 8.6.1?
We just had a site running 8.6.1 go down (all app settings as described in the docs).
It cannot hurt to have WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1
. If you are not using SqlMainDom then this is absolutely required. If you are it shouldn't be required but we have a bug where you cannot use SqlMainDom when you are load balancing (as per docs https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps)
Thanks for the speedy reply :)
We are not load balancing, and have the SqlMainDom setting.
The error we got was this:
Umbraco.Core.Exceptions.BootFailedException: Boot failed.
System.UnauthorizedAccessException: Access to the path 'D:\home\site\wwwroot\App_Data\TEMP\ExamineIndexes\Internal\write.lock' is denied.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.File.InternalDelete(String path, Boolean checkHost)
at System.IO.File.Delete(String path)
at Lucene.Net.Store.SimpleFSLock.Release() in d:\Lucene.Net\FullRepo\trunk\src\core\Store\SimpleFSLockFactory.cs:line 203
at Examine.LuceneEngine.Directories.MultiIndexLock.Release() in C:\projects\examine-qvx04\src\Examine\LuceneEngine\Directories\MultiIndexLock.cs:line 68
at Lucene.Net.Index.IndexWriter.Unlock(Directory directory) in d:\Lucene.Net\FullRepo\trunk\src\core\Index\IndexWriter.cs:line 5776
at Umbraco.Examine.ExamineExtensions.ConfigureLuceneIndexes(IExamineManager examineManager, ILogger logger, Boolean disableExamineIndexing) in D:\a\1\s\src\Umbraco.Examine\ExamineExtensions.cs:line 146
at Umbraco.Examine.ExamineExtensions.<>c__DisplayClass4_0.<ConfigureIndexes>b__0() in D:\a\1\s\src\Umbraco.Examine\ExamineExtensions.cs:line 50
at System.Threading.LazyInitializer.EnsureInitializedCore[T](T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory)
at Umbraco.Examine.ExamineExtensions.ConfigureIndexes(IExamineManager examineManager, IMainDom mainDom, ILogger logger) in D:\a\1\s\src\Umbraco.Examine\ExamineExtensions.cs:line 44
at Umbraco.Web.Search.ExamineFinalComponent.Initialize() in D:\a\1\s\src\Umbraco.Web\Search\ExamineFinalComponent.cs:line 32
at Umbraco.Core.Composing.ComponentCollection.Initialize() in D:\a\1\s\src\Umbraco.Core\Composing\ComponentCollection.cs:line 32
at Umbraco.Core.Runtime.CoreRuntime.Boot(IRegister register, DisposableTimer timer) in D:\a\1\s\src\Umbraco.Core\Runtime\CoreRuntime.cs:line 188
at Umbraco.Core.Exceptions.BootFailedException.Rethrow(BootFailedException bootFailedException) in D:\a\1\s\src\Umbraco.Core\Exceptions\BootFailedException.cs:line 80
at Umbraco.Web.UmbracoInjectedModule.<>c.<Init>b__18_0(Object sender, EventArgs args) in D:\a\1\s\src\Umbraco.Web\UmbracoInjectedModule.cs:line 369
at System.Web.HttpApplication.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
at System.Web.HttpApplication.ExecuteStepImpl(IExecutionStep step)
at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)
@vaags did you follow the docs, i don't know what the rest of your settings are: https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/azure-web-apps#recommended-configuration. That path is not the local fast drive, that's the main drive. When SyncTempEnvDirectoryFactory
is there it will write to both locations. I don't know why this location says it's locked, it would only be the case if multiple processes/appdomains are using it, or maybe it's now 'stuck' if a process terminated and didn't release the windows lock. You can try the TempEnvDirectoryFactory
instead, please read docs about the difference https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/file-system-replication#examine-directory-factory-options
@Shazwazza thanks for an update. So if I understand correctly there are two separate issues here: 1) The bug related to SqlMainDom with "load balancing" 2) The Lucene index becoming locked or corrupted due to Azure's moving/changing stuff about
In our case we have two separate cases that sort of contradict themselves, both running 8.6 with recommended settings as per the (new) docs. Recently we got the case where one of the sites refused to restart (the common fix for the lucene index dying). After some debugging, it seemed the staging instance (sharing db with prod) somehow was blocking production's restart. Stopping the slot immediately freed up the reboot to continue. This I assume would be bug 1), although I don't quite understand why staging (receiving little to no traffic) would ever cause the db lock as it was running nicely already. As for dev/test envs in the same case, both with their own dbs, one had a working lock file, the other did not (with practically 0 interaction). I assume this to be bug 2).
The weird part, is that in the other case, we have the exact same setup, but absolutely no issues. So, I understand that its difficult since Azure is out of your control, - it is however an extremely common hosting platform. These issue affect a lot of solutions and people. We need to be able to host Umbraco on Azure, its that simple (in customers' eyes its even simpler..). I saw your ExamineX idea and I guess that makes sense solving that part, but how much time remaining? Umbraco solutions have been unstable on Azure for years now.
Thanks for the helpful suggestions, @Shazwazza
Our AppSettings should be exactly as described in the docs:
<add key="Umbraco.Core.MainDom.Lock" value="SqlMainDomLock" />
<add key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" />
<add key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory, Examine" />
For now I've resorted to setting WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1
, but will investigate the TempEnvDirectoryFactory
setting you mentioned.
@JJ-DOOM yes i know everyone hosts on Azure and Umbraco does work there, thousands of people host with without any issues. Generally speaking most issues with this stuff come down to settings. Unfortunately with sometimes hidden changes in Azure we have to keep creating new work arounds. Lucene unfortunately requires quite a lot of work arounds but it can work.
For you if you are 'sharing a db' with more than one process regardless of if its a slot swap you are load balancing. Slot swapping does not happen synchronously, it will have multiple processes and appdomains trying to read write to the same resources at once while it does it's thing. This is bug 1 you mention which is probably a cause for your issues. This is because we have something called MainDom which synchronizes access to file based things like Lucene and the cache files so that multiple appdomains/processes do not try writing at the same time to locked resources.
There's other threads just as long as this one about these similar issues too but to recap the MainDom problem - Azure doesn't respect system wide semaphores which is how the default MainDom works and we were the only ones to discover this and get confirmation from the Azure team that this doesn't work which is why we have to invent new ways of working around Azure's secrets which are undocumented and can only be discovered by trial and error.
@JJ-DOOM PS... ExamineX is pretty much ready you can ping send an email to software @ sdkits.com if you want to test it out. (it is a licensed product)
@Shazwazza About your comment about a possible "stuck" process that perhaps did not release the lock: I found another error in the logs. This one occured before the boot failure:
System.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)
at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()
at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value)
at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
at System.Data.SqlClient.SqlInternalTransaction.Commit()
at System.Data.SqlClient.SqlTransaction.Commit()
at StackExchange.Profiling.Data.ProfiledDbTransaction.Commit() in C:\projects\dotnet\src\MiniProfiler.Shared\Data\ProfiledDbTransaction.cs:line 45
at NPoco.Database.CompleteTransaction()
at Umbraco.Core.Scoping.Scope.DisposeLastScope() in D:\a\1\s\src\Umbraco.Core\Scoping\Scope.cs:line 388
at Umbraco.Core.Scoping.Scope.Dispose() in D:\a\1\s\src\Umbraco.Core\Scoping\Scope.cs:line 363
at Umbraco.Core.Services.Implement.ServerRegistrationService.TouchServer(String serverAddress, String serverIdentity, TimeSpan staleTimeout) in D:\a\1\s\src\Umbraco.Core\Services\Implement\ServerRegistrationService.cs:line 89
at Umbraco.Web.Compose.DatabaseServerRegistrarAndMessengerComponent.TouchServerTask.PerformRun() in D:\a\1\s\src\Umbraco.Web\Compose\DatabaseServerRegistrarAndMessengerComponent.cs:line 259
I don't know if this is related in any way.
@Shazwazza i try to follow the docs, but i am not sure what to do in our situtation:
We have the production site and another slot for dev. And on dev slot we have the connection string set to prod database. On our dev localhost umbraco sometimes we set that prod connection string too.
So following your comments that is the same as load balancing, isnt it? If i understand, they dont access the same examine folder, but may be they can lock the prod database?
If it is considered load balancing, i am not sure about documentation here: https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/Load-Balancing/azure-web-apps When it tells about two sites one for public and one for backoffice. We have the same site for both.
In our case i would like to know the configuration needed for the production site and the configuration for the dev slot and for the localhost dev machines. Am i correct asuming our prod slot is the master and the other are the "Replica"?
@teeto In the future, these are great questions for the forum: our.umbraco.org.
For now, if you have more than one process accessing your database at once your are load balancing. There are a lot of things happening with load balancing, I cannot explain it all here but it is explained in the docs and there is a lot of information there to learn. Having your development machines connected to your prod site is not a supported practice (and also will not work with the current SqlMainDom bug if you have that configured on your dev machines). This means there are multiple master services. This could work for some cases but it's not 'safe' so be warned about that. However because each of your devs are their own machines they maintain their own lucene/cache files (this is part of how load balancing works) so there's nothing to configure on your dev machines. For your azure install, if you are slot swapping, due to the current SqlMainDom bug, you cannot use that setting. For now you absolutely must use WEBSITE_DISABLE_OVERLAPPED_RECYCLING = 1
or the work around described here previously in this thread https://github.com/umbraco/Umbraco-CMS/issues/8006#issuecomment-620328022 and you need to configure the normal Azure settings https://our.umbraco.com/documentation/Getting-Started/Setup/Server-Setup/azure-web-apps#recommended-configuration
This thread has turned into a forum/support thread but the underlying issues have been solved in:
Closing this issue now.
We've recently upgraded to Umbraco 8.6.0 since we've experienced Examine index locking issues, described here: https://github.com/umbraco/Umbraco-CMS/issues/5035 The recommendation from Shazwazza was to upgrade to 8.6.0, see his comment: https://github.com/umbraco/Umbraco-CMS/issues/5035#issuecomment-599546750
We've upgraded, followed the instructions from https://our.umbraco.com/documentation/getting-started/setup/server-setup/azure-web-apps
However, once a few hours we're getting Examine lock-related errors (details above). When we touch the web.config everything runs ok for a few hours then the problems reappear.
Umbraco version
I am seeing this issue on Umbraco version: 8.6.0, Examine v1.0.3, Lucene.Net v3.0.3
Reproduction
If you're filing a bug, please describe how to reproduce it. Include as much relevant information as possible, such as:
Bug summary
We're running in Azure, we've recently upgraded from 7 to 8.5.3 then to v8.6.0 With version 7 everything was working ok but we wanted to benefit from the latest version inprovments. With version 8.5.3, the website was working pretty well but we encountered the issues described here: https://github.com/Shazwazza/Examine/issues/161 so we upgraded.
Now all seems to work fine except some Examine-related exceptions described below:
1) When SAVING a document, the following exceptions appear in logs (think that they are related):
When SEARCHING using top-right search button, the following exception is shown:
When OPEN Settings -> Examine management:
Specifics
Exceptions are shown in Umbraco backoffice Umbraco: 8.6.0 Browser: chrome, tried in other browsers Full exceptions above, screenshots: When saving: When searching: When accessing examine settings:
Steps to reproduce
To repro exception nr 1:
To repro exception nr 2:
To repro exception nr 3:
Expected result
Exceptions are shown in Umbraco in a popup window
Actual result
There should be no exception