projectkudu / kudu

Kudu is the engine behind git/hg deployments, WebJobs, and various other features in Azure Web Sites. It can also run outside of Azure.
Apache License 2.0
3.12k stars 652 forks source link

Zip deploy started failing earlier today #2849

Closed xt0rted closed 6 months ago

xt0rted commented 6 years ago

I've been using Zip Deploy from AppVeyor for some time now with successful deployments to my site earlier in the day. The last two deployments have failed and none of the files have shown up in my staging slot.

Checking the deployment at /ZipDeploy shows the following error

2018-08-31T04:28:57.9374336Z : Copying file: 'EquineExchange.Homepage.Views.dll'
2018-08-31T04:28:57.9374336Z : Failed exitCode=1, command="kudusync" -v 50 -f "D:\local\Temp\zipdeploy\extracted" -t "D:\home\site\wwwroot" -n "D:\home\site\deployments\c9327d30db1f44b79e15fd050558f866\manifest" -p "D:\home\site\deployments\36a4e08deb894cd9a0e12f943e16956c\manifest" -i ".git;.hg;.deployment;deploy.cmd"
2018-08-31T04:28:57.9374336Z : An error has occurred during web site deployment.
2018-08-31T04:28:57.9530456Z : Error: Failed to change file that is currently being used "D:\home\site\wwwroot\EquineExchange.Homepage.Views.dll"\r\nD:\Program Files (x86)\SiteExtensions\Kudu\77.10830.3542\bin\Scripts\starter.cmd "D:\home\site\deployments\tools\deploy.cmd"

Deployment Id: c9327d30db1f44b79e15fd050558f866

The site in question is on the account with eei-debug and has the word public in the name. I'm deploying to the staging slot with auto swap enabled.

I'm not sure if it's related but I think this issue started after .net core 2.1.3 landed on my VM.

davidebbo commented 6 years ago

@xt0rted is this zipdeploy with a site set up to run from a zip package, or without?

davidebbo commented 6 years ago

Ok, so I looked and I see it doesn't have WEBSITE_RUN_FROM_ZIP. After that happens, have you checked the state of the file that it couldn't copy? If the Core runtime somehow has it hopelessly locked, then it would not be able to deploy it.

I would highly suggest switching to running from a package, which solves a lot of these locking issues.

xt0rted commented 6 years ago

@davidebbo it's without the package. I can't use the run from package version yet because of the Let's Encrypt extension.

davidebbo commented 6 years ago

Crap, we really should get that fixed. I see it's tracked on https://github.com/sjkp/letsencrypt-siteextension/issues/239

Please do check the locked state of the file after getting that error. Specifically:

An alternate solution is to drop an App_Offline.htm during the publishing to stop the app. .NET Core should be honoring that.

davidebbo commented 6 years ago

Similar case came up on https://github.com/Azure/azure-functions-host/issues/3367

@natemcmaster, any chance that the file locking behavior changed in 2.1.3?

Though still, generally it's best to either rely on App_Offline.htm or running from a zip package to avoid those locking issues.

xt0rted commented 6 years ago

2825 looks similar too

When this issue came up I tried to delete the file from the debug console and wasn't able to. I had to edit the web.config and then deleted all the files before manually uploading a copy of the zip file. It was the same process for both deployments that failed.

The deploy is coming from AppVeyor so I don't have much control over the process. Is there an option/way to tell the deploy to first take the site offline? I know Web Deploy has a setting for this, and since I'm deploying to a staging slot it doesn't matter if the site comes down before the new version is copied over.

davidebbo commented 6 years ago

Unfortunately, Kudu has no direct support for this, though that has come up. One suggestion in https://github.com/projectkudu/kudu/issues/2788 is to do it via custom deployment script, though it's unproven.

By far, my preference is to move people to running from zip package. Maybe try the letsencrypt workaround mentioned in https://github.com/sjkp/letsencrypt-siteextension/issues/239?

xt0rted commented 6 years ago

@davidebbo after looking through my code I realized I actually have a custom Let's Encrypt setup because of asp.net core's static file handling and not realizing that the extension could change where the file is saved.

The extension saves the challenge files to D:\home\site\wwwroot\.well-known but with asp.net core it needs to be D:\home\site\wwwroot\wwwroot\.well-known.

My workaround for this at the time was the following:

public void Configure(IApplicationBuilder app, IHostingEnvironment env)
{
    // ...
    app.UseStaticFiles();
    app.UseWellKnownStaticFolder();

    app.UseMvc();
}
public static class ApplicationBuilderExtensions
{
    public static void UseWellKnownStaticFolder(this IApplicationBuilder application)
    {
        var wellknownFolder = Path.Combine(Directory.GetCurrentDirectory(), ".well-known");

        application.UseStaticFiles(new StaticFileOptions
        {
            FileProvider = new PhysicalFileProvider(wellknownFolder),
            RequestPath = new PathString("/.well-known"),
            ServeUnknownFileTypes = true,
        });
    }
}

It looks like I can set the letsencrypt:WebRootPath to something like D:\home\site\LetsEncrypt and then modify the middleware to pull from there instead of Directory.GetCurrentDirectory().

natemcmaster commented 6 years ago

I'm not aware of any changes to file locking in .NET Core in 2.1.3. Take a look at the release notes and see if there were any relevant changes. https://github.com/dotnet/core/blob/master/release-notes/2.1/2.1.3/2.1.3-commits.md

xt0rted commented 6 years ago

@davidebbo since app services supports virtual directories I should be able to remove my code and then map /.well-known to a folder like D:\home\site\LetsEncrypt.

For now I'm not storing anything under .well-known so it's not a big deal to map the whole thing. If I needed to I could always set the virtual directory to a different name and then use a url rewrite rule to map just /.well-known/acme-challenge.

A quick test shows this seems to work so now it's just a matter of trying it out when running from a zip file.

kgoderis commented 6 years ago

@davidebbo FYI and full disclosure, when upgrading to azure func v2.0.12050 using brew, I flagged https://github.com/Azure/homebrew-functions/issues/10 as the formula for azure-functions-core-tools (bump to v.36) was forgotten, so, I ended up being lazy and I manually changed only the symlink to the func binary on my system, and I did not touch anything else. Could it be that the partial fix at my side is the source of the problem?

kgoderis commented 6 years ago

What is the status of this issue? It is quite critical as it is no longer possible to publish functions to Azure. If there would be a temporary manual workaround for this I would like to learn about that.

davidebbo commented 6 years ago

@davidebbo FYI and full disclosure, when upgrading to azure func v2.0.12050 using brew, I flagged Azure/homebrew-functions#10 as the formula for azure-functions-core-tools (bump to v.36) was forgotten, so, I ended up being lazy and I manually changed only the symlink to the func binary on my system, and I did not touch anything else. Could it be that the partial fix at my side is the source of the problem?

@kgoderis I don't really follow. When it comes to zipdeploy, there are two main factors you need to look at:

  1. The zip file that you are deploying
  2. The state of the web app you are deploying to

Nothing else related to client tools or symlinks should be of any relevance here. So I suggest you try to isolate by directly using the zipdeploy API and leaving all client concerns out of the equation. You can do that by going to Kudu's Tools / Zip Push Deploy UI, and drag/dropping your zip file onto it.

xt0rted commented 6 years ago

I was able to fully test the virtual directory setup with run from zip and the let's encrypt extension requesting a certificate. It's a bit of a hack but it does work and means you don't have to modify your site to try serving up the challenge files.

davidebbo commented 6 years ago

@xt0rted Great! Are the step documented somewhere if someone else needs to do that?

xt0rted commented 6 years ago

@davidebbo not yet, I'll be moving our asp.net core site over later tonight or tomorrow and plan on documenting it

xt0rted commented 6 years ago

@davidebbo Based on my understanding of https://github.com/Azure/app-service-announcements/issues/137 my original issue should be resolved by the app_offline.htm file that now gets added.

I just had a deploy go out that shows the creation of this file (no sign of it on disk though) and failed exactly as my original issue on the compiled views dll. I never got a chance to make the Let's Encrypt changes to this site so it's not yet using run from package.

For now the only way around this seems to be restarting the site, deleting the files, manually uploading the zip, and then a manual swap.

The full log for the failed deploy is:

2018-10-01T23:55:32.8695935Z : Command: "D:\home\site\deployments\tools\deploy.cmd"
2018-10-01T23:55:34.3384093Z : Handling Basic Web Site deployment.
2018-10-01T23:55:39.8699309Z : Creating app_offline.htm
2018-10-01T23:55:39.8699309Z : Error: Failed to change file that is currently being used "D:\home\site\wwwroot\EquineExchange.Homepage.Views.dll"
2018-10-01T23:55:39.8699309Z : KuduSync.NET from: 'D:\local\Temp\zipdeploy\extracted' to: 'D:\home\site\wwwroot'
2018-10-01T23:55:39.8855302Z : Copying file: 'appsettings.Development.json'
2018-10-01T23:55:39.8855302Z : Copying file: 'appsettings.json'
2018-10-01T23:55:39.8855302Z : Copying file: 'appsettings.Staging.json'
2018-10-01T23:55:39.9169781Z : Copying file: 'EquineExchange.Homepage.deps.json'
2018-10-01T23:55:39.9324317Z : Copying file: 'EquineExchange.Homepage.dll'
2018-10-01T23:55:39.9324317Z : Copying file: 'EquineExchange.Homepage.pdb'
2018-10-01T23:55:39.9324317Z : Copying file: 'EquineExchange.Homepage.runtimeconfig.json'
2018-10-01T23:55:39.948039Z : Copying file: 'EquineExchange.Homepage.Views.dll'
2018-10-01T23:55:39.948039Z : Failed exitCode=1, command="kudusync" -v 50 -f "D:\local\Temp\zipdeploy\extracted" -t "D:\home\site\wwwroot" -n "D:\home\site\deployments\870abcf5c15641b18c8ee1d0e40d8106\manifest" -p "D:\home\site\deployments\6d42c1e90f9143ddbb1bb8ec5ac1d2bc\manifest" -i ".git;.hg;.deployment;deploy.cmd"
2018-10-01T23:55:39.948039Z : An error has occurred during web site deployment.
2018-10-01T23:55:39.9949474Z : Error: Failed to change file that is currently being used "D:\home\site\wwwroot\EquineExchange.Homepage.Views.dll"\r\nD:\Program Files (x86)\SiteExtensions\Kudu\78.10925.3575\bin\Scripts\starter.cmd "D:\home\site\deployments\tools\deploy.cmd"
davidebbo commented 6 years ago

@xt0rted as a test, what happens is you:

If that still doesn't work, then it could be a sign that the runtime is not correctly shutting down when app_offline appears.

xt0rted commented 6 years ago

@davidebbo once the site loads I'm unable to delete the view dll because it's locked. I then manually created an app_offline.htm and tried again, this time I was able to delete the views dll.

As a followup test I tried doing this from the console with this and it fails with the same error:

set-content -path app_offline.htm -value '<html><body>offline</body></html>'; remove-item EquineExchange.Homepage.Views.dll

What does look to work is waiting a few seconds between each command. This version does work for me:

set-content -path app_offline.htm -value '<html><body>offline</body></html>'; [System.Threading.Thread]::Sleep(2000); remove-item EquineExchange.Homepage.Views.dll

I tried with 1 second but that didn't work, 2 seconds or longer does.

davidebbo commented 6 years ago

Thanks @xt0rted, that makes sense.

@ankitkumarr you had mentioned that kudusync already had a retry policy, so I expected that it would have ended up retrying/waiting until app_offline actually takes effect. Can you double check that logic?

ankitkumarr commented 6 years ago

@davidebbo, the current retry policy in kudusync tries to perform any operation about 3 times before it deems the operation as failed. However, the wait time between each retry is only about 250 ms. So, I guess we could increase the number of retries and/or the delay between two retries.

I think it might even make sense to increase the retry to 5 or more and the time between them to be 500ms (or we could incrementally increase the time between retries too). It shouldn't impact the overall deployment time much.

davidebbo commented 6 years ago

@ankitkumarr Yep, it's most likely the problem that we just don't wait enough, given that Brian said he found 1 second to not be enough. I'd keep the 250 ms delay between tries, but just have more tries, like up to 10.

xt0rted commented 6 years ago

@davidebbo I have about 10 deploys queued up for this site so I moved forward with switching to the run from package setup. So far there's been no deployment issues.

To get the Let's Encrypt extension to work I've done the following:

  1. Create a folder called LetsEncrypt under D:\home\site
  2. Create a folder called .well-known under D:\home\site\LetsEncrypt
  3. In the site's Application settings create a new virtual path called /.well-known which points to site\LetsEncrypt\.well-known
  4. Add an App Setting called letsencrypt:WebRootPath with a value of D:\home\site\LetsEncrypt
  5. Once the site's using run from package uninstall and reinstall the Let's Encrypt Site extension, this looks to force the webjob to be setup outside of the wwwroot folder

My site doesn't use the .well-known folder right now, but on my test site I did and had that working by:

  1. Creating a virtual path called /.well-known which points to site\wwwroot\wwwroot\.well-known (or where ever your real folder is stored)
  2. Creating a second virtual path called /.well-known/acme-challenge which points to site\LetsEncrypt\.well-known\acme-challenge

My cert is due to renew any day now so instead of forcing this to renew I'm just letting it happen automatically. If any issues come up I'll be sure to post the changes here. Otherwise this seems like a viable work around for the time being.

Update: my cert auto renewed without any issues

davidebbo commented 6 years ago

Thanks @xt0rted! Maybe put that info on https://github.com/sjkp/letsencrypt-siteextension/issues/239 as well?

romanryasne2 commented 4 years ago

Any updates? I am experiencing this issue too.

jvano commented 6 months ago

Hi

If the problem persists and is related to running it on Azure App Service, please open a support incident in Azure: https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request

This way we can better track and assist you on this case

Thanks,

Joaquin Vano Azure App Service

GioviQ commented 2 months ago

same problem still present

Command: "C:\home\site\deployments\tools\deploy.cmd" Handling Basic Web Site deployment. Creating app_offline.htm Error: Failed to change file that is currently being used "C:\home\site\wwwroot\logs\log-20240811.log" KuduSync.NET from: 'C:\local\Temp\zipdeploy\extracted' to: 'C:\home\site\wwwroot' Deleting file: 'Keys\key-ca138594-7202-4e7e-87ca-ed88cb6c3aae.xml' Deleting directory: 'Keys' Deleting file: 'logs\log-20240811.log' Failed exitCode=1, command="kudusync" -v 50 -x -f "C:\local\Temp\zipdeploy\extracted" -t "C:\home\site\wwwroot" -n "C:\local\Temp\tmp80CC.tmp" -p "C:\local\Temp\tmp80CC.tmp" -i ".git;.hg;.deployment;deploy.cmd" An error has occurred during web site deployment. Error: Failed to change file that is currently being used "C:\home\site\wwwroot\logs\log-20240811.log"\r\nC:\Program Files (x86)\SiteExtensions\Kudu\102.10502.001\bin\Scripts\starter.cmd "C:\home\site\deployments\tools\deploy.cmd"