umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.49k stars 2.69k forks source link

The process cannot access the file NuCache.Content.db because it is being used by another process #5035

Closed robertjf closed 2 years ago

robertjf commented 5 years ago

PR: https://github.com/umbraco/Umbraco-CMS/pull/5924

Something is happening in Azure WebApps where the NuCache.Content.db file is locked causing the site to hang. I've attached the log file for reference.

The exception is as follows:

Umbraco.Core.Exceptions.BootFailedException: Boot failed: Umbraco cannot run. See Umbraco's log file for more details.

-> Umbraco.Core.Exceptions.BootFailedException: Boot failed.

-> System.IO.IOException: The process cannot access the file 'D:\home\site\wwwroot\App_Data\TEMP\NuCache\NuCache.Content.db' because it is being used by another process.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
   at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at CSharpTest.Net.IO.TransactedCompoundFile..ctor(Options options)
   at CSharpTest.Net.Storage.BTreeFileStoreV2..ctor(Options options)
   at CSharpTest.Net.Collections.BPlusTree`2.OptionsV2.CreateStorage()
   at CSharpTest.Net.Collections.BPlusTree`2.NodeCacheBase..ctor(BPlusTreeOptions`2 options)
   at CSharpTest.Net.Collections.BPlusTree`2.NodeCacheNormal..ctor(BPlusTreeOptions`2 options)
   at CSharpTest.Net.Collections.BPlusTree`2..ctor(BPlusTreeOptions`2 ioptions)
   at Umbraco.Web.PublishedCache.NuCache.DataSource.BTree.GetTree(String filepath, Boolean exists)
   at Umbraco.Web.PublishedCache.NuCache.PublishedSnapshotService..ctor(Options options, IMainDom mainDom, IRuntimeState runtime, ServiceContext serviceContext, IPublishedContentTypeFactory publishedContentTypeFactory, IdkMap idkMap, IPublishedSnapshotAccessor publishedSnapshotAccessor, IVariationContextAccessor variationContextAccessor, IUmbracoContextAccessor umbracoContextAccessor, ILogger logger, IScopeProvider scopeProvider, IDocumentRepository documentRepository, IMediaRepository mediaRepository, IMemberRepository memberRepository, IDefaultCultureAccessor defaultCultureAccessor, IDataSource dataSource, IGlobalSettings globalSettings, ISiteDomainHelper siteDomainHelper, IEntityXmlSerializer entitySerializer, IPublishedModelFactory publishedModelFactory, UrlSegmentProviderCollection urlSegmentProviders)

...

This is on Umbraco 8.0.1 which was upgraded from Umbraco 8.0.0. (attachment will need to be changed from .txt to.json)

UmbracoTraceLog.RD2818786B7D96.20190319.txt

zpqrtbnk commented 5 years ago

Is this on Umbraco Cloud, or on pure Azure? Where is D:\home\site\wwwroot\App_Data\TEMP - is it on a shared drive, or on a local drive? Is it possible that another instance of the site, running on another server, is trying to access these files? These files can only be accessed by 1 instance at a time - we have mechanism in place to ensure this on one server but not accross servers.

robertjf commented 5 years ago

This is in standard Azure on a basic plan with single instance only, there’s no special configuration at all, pretty vanilla Umbraco 8 with hacked up starter kit

Shazwazza commented 5 years ago

The same config principles apply to v8 as v7 for running on Azure. We defo need to get the docs updated (PRs are welcome!) The current azure docs are here https://our.umbraco.com/Documentation/Getting-Started/Setup/Server-Setup/azure-web-apps

for v8 you will need these in appSettings:

robertjf commented 5 years ago

interestingly enough, there is no examine config files in this website - I hadn't noticed that before... is that normal? Out of the box shouldn't this just work? Do I need to add the files?

Shazwazza commented 5 years ago

There is no examine config files in v8. OOTB, just like v7, you need to adjust some config to make umbraco work with azure. So the above 2 config values are needed. These are equivalent to the v7 appSetting umbracoLocalTempStorage and the examine config directoryFactory switch.

Shazwazza commented 5 years ago

The 2 config values are just in your web.config, these are just appSetting key values in v8

robertjf commented 5 years ago

ah, right - good to know :)

robertjf commented 5 years ago

so is it right to assume these are going to be automatically added in future, or are they only in certain situations?

Shazwazza commented 5 years ago

No you cannot assume that, Just like deploying v7 to Azure you need to setup specific config for that, it's no different in v8.

If however, you create an Umbraco website from the Azure portal, then yes these should be pre-configured for you but I don't think we have a v8 build on the Azure portal yet.

JoseMarcenaro commented 5 years ago

Thanks for explaining the required config settings.

But in v8, even with the above settings, the exact error happens when you enable Slot Swap in the Azure App Service: before the swap both slots are working ok, but after the Swap the new "Production" slot throws the error every time. A full App Service restart is needed to make things working again.

If you work with a single slot everything is Ok,

Shazwazza commented 5 years ago

That's interesting, will assume the same problem exists in v7 too since it's the same paradigm.

I think the only way around such behavior is to have an option to not have a persisted cache file, or name the cache file based on the AppDomainAppId + MachineName (should be unique among processes).

@zpqrtbnk what do you think here?

JoseMarcenaro commented 5 years ago

Note: We have a site on Umbraco 7.13.2 running on Azure with slot swapping (and the "LocalTempStorage" ="EnvironmentTemp" app setting) and we never run into this problem with the XML cache file. Maybe because it's not open all the time?

zpqrtbnk commented 5 years ago

NuCache stores its file in Path.Combine(_globalSettings.LocalTempPath, "NuCache") and therefore respects the <add key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" /> setting. The temp path combines the application id and the site name, but not the machine name, as the local temp is supposed to be local, ie per-server.

I don't know much about "slot swaps" in Azure. But, if they are running on the same machine, same application id, same site name, same temp dir then... there might be a collision? That would require some troubleshooting to get it right. But then, as you mention, it's not only NuCache but other things too.

Now indeed, the difference is that v8 locks the NuCache files for as long as it's running where v7 was "using" the Xml file from time to time = the two sites may cohabit (but that was a bad idea).

So... to make it short, I'd like to hear more about Slot Swaps and, on all slots, get the value of HostingEnvironment.SiteName, HostingEnvironment.ApplicationID, and %temp%.

JoseMarcenaro commented 5 years ago

@zpqrtbnk thanks for looking into this. I will post today the requested information about the Slot Swaps and those values.

JoseMarcenaro commented 5 years ago

This is what I got on each slot, before and after the swap:

<!-- STAGE slot, before swap

    HostingEnvironment.SiteName = 'terniumcomdev__e951'
    HostingEnvironment.ApplicationID = '/LM/W3SVC/1464357220/ROOT'
    %temp% = 'D:\local\Temp'
    -->
<!-- PRODUCTION slot, before swap
    HostingEnvironment.SiteName = 'terniumcomdev__dd91'
    HostingEnvironment.ApplicationID = '/LM/W3SVC/1742369212/ROOT'
    %temp% = 'D:\local\Temp'
    -->

<!-- STAGE slot, after swap
    HostingEnvironment.SiteName = 'terniumcomdev__dd91'
    HostingEnvironment.ApplicationID = '/LM/W3SVC/1742369212/ROOT'
    %temp% = 'D:\local\Temp'
    -->

<!-- PRODUCTION slot, after swap 
    HostingEnvironment.SiteName = 'terniumcomdev__e951'
    HostingEnvironment.ApplicationID = '/LM/W3SVC/1464357220/ROOT'
    %temp% = 'D:\local\Temp'
    -->

And this time the error was not thrown. In a previous execution (with different slots) I got the following:

System.IO.IOException: The process cannot access the file 'D:\local\Temp\UmbracoData\854d45b396372eb551323cc722f33136\NuCache\NuCache.Content.db' because it is being used by another process.

I am positively sure that this setting was in effect when the error was thrown:

<add key="Umbraco.Core.LocalTempStorage" value="EnvironmentTemp" />

but this one was not (may this be the issue?)

<add key="Umbraco.Examine.LuceneDirectoryFactory" value="Examine.LuceneEngine.Directories.SyncTempEnvDirectoryFactory, Examine" />

In my latest test (no error) both settings were in effect.

zpqrtbnk commented 5 years ago

The Lucene setting has no impact on NuCache. The fact that you are seeing an error such as The process cannot access the file 'D:\local\Temp\UmbracoData\854...136\NuCache\NuCache.Content.db' indicates that you are indeed using the proper local temp storage setting.

NuCache uses the built-in MainDom mechanism to ensure that only one app domain at a time can own the cache (and the associated files). MainDom uses a machine-wide named lock; the name is built by combining the application id and the application physical path.

(so obviously I should also have asked you the HostingEnvironment.ApplicationPhysicalPath values...)

A site cannot, in theory, even try to access the NuCache file until it has aquired the MainDom lock. Therefore, for the error to happen... the site must own the machine-wide lock on (app.id, app.path) and yet someone else must lock the files at (app.id, site.name), meaning

Any chance you can get the ApplicationPhysicalPath values?

And... I don't know enough about "slots" to figure out the physical server thing. Ideas?

JoseMarcenaro commented 5 years ago

Just tested it. The ApplicationPhysicalPath value is always D:\home\site\wwwroot\ in both slots, before and after the swap.

Regarding the physical server... it is not fully documented, but many posts assume it is the same because all slots share the same App Service Plan = the same set of resources (i.e. 3.75 Gb RAM).

zpqrtbnk commented 5 years ago

So... assuming that all sites run on the same physical server (for now), that leaves us with app.path.

D:\home\site\wwwroot\ is a kinda virtual path that Azure uses, so basically all apps are always under that path, and that can be annoying in some cases... but in our case, it should indeed make things even more safer.

Did not realize you posted a log file with the original issue - now looking at that file.

JoseMarcenaro commented 5 years ago

@zpqrtbnk - the log file is not mine, so it does not apply to the slot issue. Thanks.

JoseMarcenaro commented 5 years ago

In any case, for some mysterious reason the "process cannot access the file NuCache.Content.db" error has not appeared lately - in all slots swaps during the last three days. I will post more information - including a log - if it does happen again. Thanks for your help!

zpqrtbnk commented 5 years ago

Thanks for the update.

Even though it's not your log... the significant lines are:

2019-03-19 03:16:49,695 [P9488/D3/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.0 on RD2818786B7D96. [Timing e8bc0a7]
2019-03-19 03:16:50,759 [P9488/D3/T1] INFO   Umbraco.Core.MainDom - Acquired.
2019-03-19 03:17:39,322 [P9488/D3/T17] INFO   Umbraco.Core.MainDom - Released (environment)
2019-03-19 03:25:35,501 [P9488/D4/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.0 on RD2818786B7D96. [Timing 761ae5c]
2019-03-19 03:25:36,074 [P9488/D4/T1] INFO   Umbraco.Core.MainDom - Acquired.
2019-03-19 03:27:45,813 [P9488/D4/T16] INFO   Umbraco.Core.MainDom - Released (environment)
2019-03-19 03:27:49,388 [P9488/D5/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.0 on RD2818786B7D96. [Timing 9ce9254]
2019-03-19 03:27:50,970 [P9488/D5/T1] INFO   Umbraco.Core.MainDom - Acquired.
2019-03-19 03:28:24,123 [P9488/D5/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booted. (34755ms) [Timing 9ce9254]
2019-03-19 03:28:59,434 [P9472/D2/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.0 on RD2818786B7D96. [Timing 56405ae]
2019-03-19 03:29:06,148 [P9472/D2/T1] INFO   Umbraco.Core.MainDom - Acquired.
2019-03-19 03:29:31,320 [P9472/D2/T1] ERROR  Umbraco.Core.Runtime.CoreRuntime - Boot failed. (32637ms) [Timing 56405ae]
Umbraco.Core.Exceptions.BootFailedException: Boot failed. ---> System.IO.IOException: ...
2019-03-19 03:29:46,191 [P9488/D5/T13] INFO   Umbraco.Core.MainDom - Released (environment)

Where we see that a new process (9472) starts while the previous one (9488) is running, and acquires the MainDom lock before the previous process releases it, thus hitting an exception when trying to read the cache. That should not be possible...

If anything happens again, thanks for reporting. Meanwhile, I've made a few changes so that in 8.0.2 we log all the important infos (app.path, app.id, etc) when the site boots.

JoseMarcenaro commented 5 years ago

Hi @zpqrtbnk Some more info (the exception is being thrown again in my Azure environment). A log is attached, the same behavior you described happens in the last lines of the log.

Note that in this case:

UmbracoTraceLog.RD2818784FF824.20190409.zip

I'm running v8.0.1 installed via the nuget.org package.

Hope it helps.

JoseMarcenaro commented 5 years ago

@zpqrtbnk - I could further isolate the problem.

I performed the following steps:

So it looks like this is related to the Azure Slot swap - although it is the same machine, for some reason the process starts over without having shut down the previous one.

UmbracoTraceLog.RD2818784FF824.20190409.zip

JoseMarcenaro commented 5 years ago

One last comment about the slot swap process: the "swapped" Web app remains the same, in the same machine .. but the URLs that hit it change (i.e. from contoso-stage.azurewebsites.net to contoso.azurewebsites.net). I don't know if immediately after the switch the Azure infrastructure sends some kind of wakeup call.

zpqrtbnk commented 5 years ago

Hey - thanks for the details - just FYI I am away at the Barcelona meetup, with little time for this issue - will resume work on it at the end of the week (so don't feel bad if I don't reply).

zpqrtbnk commented 5 years ago

~Have looked at your logfile and... here is the thing: the new process (P564) is starting for a different application (since you swapped). So P8108 was for the other application. It does not have to release its lock for P564 to acquire it... OTOH the lock has to be released... but that should happen on the other slot. And so... I would love to see the other log. Is this possible?~

~Note that it still does not explain what is going on with the cache. I am currently running some experiments with Azure AppServices to try to get a better understanding of it all.~

Or maybe not - further experimenting.

zpqrtbnk commented 5 years ago

I have created an Azure App Service hosting a simple MVC web app, with two slots. Had to have one per-slot appSetting in order to force slot-swaps to trigger a restart. Then, have our MainDom system and some temp file lock to the app... and I can swap slots, and see the MainDom lock being properly released, and swaps happening without issues.

I have to accept that something is wrong, considering your log, but for now I am running out of idea.

During the swap, on my test, the old process terminates and then a new process starts. In your log, the new process starts before the old process terminates, but that should not be a problem: this is precisely why we have the MainDom lock.

Is this happening every time you swap, or only from time to time? Is this on a production / live site? To which extend would it be possible for you to run some custom DLLs which would include more tracing / logging?

JoseMarcenaro commented 5 years ago

@zpqrtbnk , The error is no longer happening. By the time I reported this and send you the log, it was happening on every swap.

I did not change anything in Umbraco since then - the same package 8.0.1 is used.

I'm almost sure the difference is that, while trying to workaround the error, I completely disabled Application Insights on Azure - it was enabled by default when I first created the App Service, and I did not opt out of it as I usually do.

To verify if this was the cause I enabled again Application Insights in both slots and something strange happened:

So looking in retrospect, now my thoughts are:

Current situation:

Thanks for your help!

zpqrtbnk commented 5 years ago

Thanks for keeping up with the detective work ;-) I have tried to experiment with enabling/disabling AppliCation Insights on my test app, but still cannot reproduce the out-of-order lock management that you see.

Happy that it works for you, but it annoys me.

Will lower the priority of this issue, but still, will try to send you a DLL to test, that would log way more details about what is going on. Stay tuned!

zpqrtbnk commented 5 years ago

This: Debug.zip contains a patched DLL that just logs infos when acquiring the main lock (process id, but also user name, app id and maindom hash) - if you have a moment to try it...

JoseMarcenaro commented 5 years ago

Done. I could reproduce the error. No slot swapping involved, I just enabled Application Insights.

Initial state:

Clean restart to clean up logs

Enable Application Insights

Let me know if I can perform any additional tests

zpqrtbnk commented 5 years ago

From the log:

2019-04-15 21:07:33,806 [P9404/D2/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.1. [Timing f0400f6]
2019-04-15 21:07:33,933 [P9404/D2/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting 
    site 'terniumcomqa__482f', 
    app '/LM/W3SVC/653304847/ROOT', 
    path 'D:\home\site\wwwroot\', 
    server 'RD2818785491E5'.
2019-04-15 21:07:34,696 [P9404/D2/T1] INFO   Umbraco.Core.MainDom - DEBUG
    server   RD2818785491E5
    site     terniumcomqa__482f
    process  9404
    user     IIS APPPOOL\terniumcomqa__482f
    domain   2
    appId    /LM/W3SVC/653304847/ROOT
    appPath  D:\home\site\wwwroot\
    temp     D:\local\Temp
    local    D:\local\Temp\UmbracoData\935c1c6dfba987b17a7c67a60ce950cb
    mdhash   d04962ddd4ea323ea555b5a47c5349c8f613ab9e
2019-04-15 21:07:34,714 [P9404/D2/T1] INFO   Umbraco.Core.MainDom - Acquiring.
2019-04-15 21:07:34,714 [P9404/D2/T1] INFO   Umbraco.Core.MainDom - Acquired.
[...]
2019-04-15 21:09:54,491 [P9404/D2/T15] INFO   Umbraco.Core.MainDom - Stopping (environment)
2019-04-15 21:09:54,647 [P9404/D2/T15] INFO   Umbraco.Core.MainDom - Released (environment)
[...]
2019-04-15 21:10:02,321 [P9404/D3/T24] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.1. [Timing c4c5f6d]
2019-04-15 21:10:02,446 [P9404/D3/T24] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting 
    site 'terniumcomqa__482f', 
    app '/LM/W3SVC/653304847/ROOT', 
    path 'D:\home\site\wwwroot\', 
    server 'RD2818785491E5'.
2019-04-15 21:10:02,758 [P9404/D3/T24] INFO   Umbraco.Core.MainDom - DEBUG
    server   RD2818785491E5
    site     terniumcomqa__482f
    process  9404
    user     IIS APPPOOL\terniumcomqa__482f
    domain   3
    appId    /LM/W3SVC/653304847/ROOT
    appPath  D:\home\site\wwwroot\
    temp     D:\local\Temp
    local    D:\local\Temp\UmbracoData\935c1c6dfba987b17a7c67a60ce950cb
    mdhash   d04962ddd4ea323ea555b5a47c5349c8f613ab9e
2019-04-15 21:10:02,778 [P9404/D3/T24] INFO   Umbraco.Core.MainDom - Acquiring.
2019-04-15 21:10:02,778 [P9404/D3/T24] INFO   Umbraco.Core.MainDom - Acquired.
[...]
2019-04-15 21:10:12,305 [P9404/D3/T24] INFO   Umbraco.Core.Runtime.CoreRuntime - Booted. (10011ms) [Timing c4c5f6d]
[...]
2019-04-15 21:10:16,369 [P1480/D2/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting Umbraco 8.0.1. [Timing aa57fab]
2019-04-15 21:10:16,727 [P1480/D2/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Booting 
    site 'terniumcomqa__482f', 
    app '/LM/W3SVC/653304847/ROOT', 
    path 'D:\home\site\wwwroot\', 
    server 'RD2818785491E5'.
2019-04-15 21:10:18,852 [P1480/D2/T1] INFO   Umbraco.Core.MainDom - DEBUG
    server   RD2818785491E5
    site     terniumcomqa__482f
    process  1480
    user     IIS APPPOOL\terniumcomqa__482f
    domain   2
    appId    /LM/W3SVC/653304847/ROOT
    appPath  D:\home\site\wwwroot\
    temp     D:\local\Temp
    local    D:\local\Temp\UmbracoData\935c1c6dfba987b17a7c67a60ce950cb
    mdhash   d04962ddd4ea323ea555b5a47c5349c8f613ab9e
2019-04-15 21:10:18,852 [P1480/D2/T1] INFO   Umbraco.Core.MainDom - Acquiring.
2019-04-15 21:10:18,962 [P1480/D2/T1] INFO   Umbraco.Core.MainDom - Acquired.
[...]
2019-04-15 21:10:24,822 [P9404/D2/T20] INFO   Umbraco.Web.UmbracoApplicationBase - Application shutdown. Details: ConfigurationChange
[...]
2019-04-15 21:10:31,793 [P1480/D2/T1] INFO   Umbraco.Core.Runtime.CoreRuntime - Resolved. (10046ms) [Timing 0ba631e]
2019-04-15 21:10:38,462 [P1480/D2/T1] ERROR  Umbraco.Core.Runtime.CoreRuntime - Boot failed. (22127ms) [Timing aa57fab]
Umbraco.Core.Exceptions.BootFailedException: Boot failed. ---> System.IO.IOException: The process cannot access the file 
    'D:\local\Temp\UmbracoData\935c1c6dfba987b17a7c67a60ce950cb\NuCache\NuCache.Content.db' 
    because it is being used by another process.
   at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
[...]
2019-04-15 21:10:58,311 [P9404/D3/T19] INFO   Umbraco.Core.MainDom - Stopping (environment)
2019-04-15 21:10:58,686 [P9404/D3/T19] INFO   Umbraco.Core.MainDom - Released (environment)

We see P9404/D2 booting, acquiring the MainDom lock, running, then releasing the MainDom lock, because it has been notified by the environment that it had to shut down. And then P9404/D3 boots, acquires the MainDom lock, and runs.

And then, P1480/D2 starts and... acquires the MainDom lock, even though P9404/D3 has not been notified of its shutdown and has not released the lock (it does, 20 or 30 seconds later).

The important line here is mdhash d04962ddd4ea323ea555b5a47c5349c8f613ab9e - that hash is directly used as a name for the named Semaphore we use to handle the MainDom lock - and these semaphores are supposed to be system-wide. Since the same name is used everywhere... I fail to understand how P1480/D2 can acquire a lock that has not been released.

There has to be something I am not seeing here. Thanks for reporting... will think about it all and try to have an idea!

zpqrtbnk commented 5 years ago

Note to self: have run some tests of the MainDom class on local machine, with two console apps, and it all works as expected. Note that the app that releases the lock is Signaled (signal) whereas in the above log, it's always (environment) meaning... P9404/D3 is not even signaled to stop?!

zpqrtbnk commented 5 years ago

If I can bother you with another Debug.zip... MANY THANKS for your time here!

Jeavon commented 5 years ago

@JoseMarcenaro just wondering, do you have any applicationInitialization configuration?

JoseMarcenaro commented 5 years ago

Done. Same secuence as before, same error result. error2-log.zip

JoseMarcenaro commented 5 years ago

@Jeavon you are right. I do have applicationInitialization configuration.

zpqrtbnk commented 5 years ago

Thanks for the log. What it contains is:

At that point... P7468/D2 just gets the lock immediately. It signals the current lock holder, P3488/D8, but P3488/D8 does not report this. It just reports, later on, that it is signaled by the hosting environment, and releases the lock.

So

And then again, this all seems to happen on the very same hosting environment / server. Shared semaphores and events are at system-level so... they should work here. Just can't figure out what is going on :(

Jeavon commented 5 years ago

@JoseMarcenaro I think that might be a clue for @zpqrtbnk I only asked as I only saw the error once I had added applicationInitialization, could be coincidence but with applicationInitialization IIS is internally within the module starting up a new site

zpqrtbnk commented 5 years ago

but with applicationInitialization IIS is internally within the module starting up a new site

afraid I don't understand ;-) can you be more... something?

Jeavon commented 5 years ago

And it's starting internally before shutting down the currently running one, once it gets the 200 then it makes it public to serve requests

Jeavon commented 5 years ago

@zpqrtbnk to enable locally add within system.webServer element of web.config

    <applicationInitialization doAppInitAfterRestart="true">
      <add initializationPage="/"/>
    </applicationInitialization>
Jeavon commented 5 years ago

Trying to find out how it works exactly, this seems to be the most info https://docs.microsoft.com/en-us/iis/get-started/whats-new-in-iis-8/iis-80-application-initialization

JoseMarcenaro commented 5 years ago

I thought ApplicationInitialization configuration was used by the Azure infrastructure only when launching a swap, to warm up the stage slot before performing the swap and starting to send requests to it. But that "doAppInitAfterRestart" attribute shows me it might not be the case. In my app, I do not have the attribute set.

    <applicationInitialization>
      <add initializationPage="/es/index" hostName="www.txdev.ternium.com"/>
    </applicationInitialization>
Jeavon commented 5 years ago

@JoseMarcenaro it is used whenever IIS needs to start/restart. For Web Apps it's correct you are using it. It will be used for things such as Slot Swapping, Scale (cold boots) and server transition (when your site is moved to a new server) and app recycle. Without it, you will not get continuous operation.

JoseMarcenaro commented 5 years ago

From https://docs.microsoft.com/en-us/iis/configuration/system.webserver/applicationinitialization/: doAppInitAfterRestart

Optional Boolean attribute.Specifies that the initialization process is initiated automatically whenever an application restart occurs. Note that this is different than the preLoadEnabled attribute in the application element, which specifies that the initialization process is started after a restart of the application pool. The default value is false.

Jeavon commented 5 years ago

@JoseMarcenaro have you tried setting it to true?

zpqrtbnk commented 5 years ago

So... the reason why the new domain starts before the previous one is down, would be because IIS is warming up the new domain, then swaps them, then terminates the old domain. Makes sense.

Nevertheless, in theory, the new domain should signal the old domain, and then wait for the old domain to release the lock, before acquiring it. Whereas... it all looks like the new domain is running in isolation and not seeing the rest of the world...

Going to try and repro by enabling app initialization...

JoseMarcenaro commented 5 years ago

@Jeavon no, but I don't think it will add any useful information. I can run my site without these errors - it now happens only when setting on Application Insights and then everything goes right.

So avoiding the error it is not the priority. Finding more information about how it happens is.

zpqrtbnk commented 5 years ago

Odd, on my test app it always shuts an app pool before starting another - either I fail at configuring app initialization, or ...