sjkp / letsencrypt-siteextension

Azure Web App Site Extension for easy installation and configuration of Let's Encrypt issued SSL certifcates for custom domain names.
744 stars 77 forks source link

WebJobs disappear #71

Open MrDesjardins opened 7 years ago

MrDesjardins commented 7 years ago

Context: I am deploying automatically my website when the code is commited with Azure using the Azure Deployment Slot. Once the website warm, I switch from the staging to the production environment. The website deployed has continuous webjobs using Microsoft.Azure.WebJobs with many static method that has timer trigger. The first time I used the site extension, and the step defined in https://gooroo.io/GoorooTHINK/Article/16420/Lets-Encrypt-Azure-Web-Apps-the-Free-and-Easy-Way/21872#.V-yO8vArIuU I saw the WebJobs next to my WebJob entry in Azure Portal (as well as in Kudu).

Problem Few months passed, I received a notification that my certificate was about to expired. I checked and the webjob wasn't there. In fact, the extension was not even installed anymore. I reinstalled the extension.

Today, I went back to see if the webjobs was still up, no web job for LetsEncrypt. Nevertheless, the extension is still installed.

I'll try to have a direct pattern but it seems that something is taking off the webjobs.

sjkp commented 7 years ago

My guess is the swap from staging removes it, since the job doesn't exists in the staging slot. And swapping is effectively a disk swap of the two slots.

I'm working on a better approach where the web job lives outside of the actual web site, in a azure function, hopefully when I'm done with the the solution will be more reliable.

MrDesjardins commented 7 years ago

I can confirm that the job appears and disappears depending of which slot is the primary one. So, in short, this extension disable the Let's Encrypt renew job if you are in a slot that wasn't the one in the primary slot during the install of the extension. Small detail, my testing staging slot has this application setting : WEBJOBS_STOPPED 1 because I do not want to have duplicated jobs to run, maybe people that doesn't use that setting has the extension that still work even when switching since the job run in their staging slot.

sjkp commented 7 years ago

The extension now supports deployment slots, this should fix the issue, but you have to make sure that you install the extension in your deployment slot as well (you don't have to actually set it up in the deployment slot if you dont need SSL there, just as long as the web job is there so it gets transferred when you swap).

Do make sure that the letsencrypt: app settings are fixed to the production slot, if you dont setup the site extension in any deployment slot you will have to do this manually.

MrDesjardins commented 7 years ago

Thank you @sjkp for the follow back. Really appreciate that you take some of your time to fix this. Not sure what you mean by "fixing" the app setting though.

To make it crystal clear, here are the full steps.

  1. Go to https://xxxx.scm.azurewebsites.net/SiteExtensions/#installed where xxx is your website.
  2. You should see the existing extension which has 4 blues buttons. The third one, with an arrow that point up, is the update. Click it. In my case, the version went from 0.4.15 to 0.5.0
  3. You'll have to do the same number one step but for your staging slot. Go to https://xxx-staging.scm.azurewebsites.net/SiteExtensions/#installed . "Staging" here was the name of my slot.
  4. This step is also the same as the step 2.
  5. Ensure that both of the website and the slot share the same Let's Encrypt (they do not have checkboxes checked in the configuration).
sjkp commented 7 years ago

Not sure what you mean by 5. But the important thing is that you go to your production slot using portal.azure.com and make sure that, you have checked the checkboxes next to slot settings for all the letsencrypt: settings so they are fixed to the production slot. image

Otherwise when you swap, you will get the app settings from the staging slot and if you haven't setup letsencrypt there you app settings in production will be gone, which will result in the web job failing next time it is going to run. If you have setup letsencrypt in your staging slot, then the extension should have placed the check marks for you and you should be safe to swap :)

MrDesjardins commented 7 years ago

Hum, my step 5 is directly the opposite of what you are asking me to do. It's not a slot setting (uncheck). However, I have the LetsEncrypt's settings setup in both place so it will work too. I just looked and the extension is now running on both (production and staging) and it didn't placed the check marks for me. That said, everything seem to be correct.

We can mark this as completed, the ultimate test will be in few weeks when the certificate will expire.

InteXX commented 6 years ago

@sjkp

I'm working on a better approach where the web job lives outside of the actual web site, in a azure function, hopefully when I'm done with the the solution will be more reliable.

Have you had any luck with this? I accidentally deleted my WebJob just this very afternoon, so this would come in very handy.

Tsaukpaetra commented 5 years ago

I would very much like a hook endpoint that simply checks for the existence of the webjob and reinstalls it if not. Years upon years and this is still a problem? Why?

InteXX commented 5 years ago

@Tsaukpaetra

Easy now hoss... :-)

He's doing some good work for us on this one, and he's not charging us a dime. He needs our support and gratitude, not our condemnation.

Tsaukpaetra commented 5 years ago

@Tsaukpaetra

Easy now hoss... :-)

I'd submit a pull request myself if I was confident I could do it without building it, my adventures in getting NuGet and stuff compiling is at zero.

InteXX commented 5 years ago

@Tsaukpaetra

I believe you've just set your path before you ;-)

sjkp commented 5 years ago

If you want to not have the job co-exists with your web app. You can always use the API provided by the siteextension that comes without web jobs, and then call that API from somewhere else, e.g. a azure function or another orchestration platform. I have a write up on my web site for that approach https://wp.sjkp.dk/lets-encrypt-on-azure-web-apps-using-a-function-app-for-automated-renewal/

@Tsaukpaetra originally I didn't envision that people would constantly delete their web jobs, if I had known I would never have build it around web jobs. What you suggest is just a patch to what I believe is fundamentally a bad architecture, so it is not something that I'm going to spend time on implementing.

I would however love to build an easier installation process where the jobs run outside of the web app that they are renewing certificates for just haven't found the time for it. Maybe one day I will win the lottery and I can work on open source full time :)

bistok commented 5 years ago

@sjkp

I would however love to build an easier installation process where the jobs run outside of the web app ...

I have been working with your .net CORE implementation and I think that can run like a function with DNS challenges and that can work well, I'm finding the time to do it, and if I can will send a pull request.

sjkp commented 5 years ago

@bistok sounds great, I'm also going to migrate the http challenge to the .net core version and then there should be a good solution for all scenarios. I will do some work on the it in the coming weeks, but we will se what I decide to prioritize.

@InteXX - Tbh, I think it is only understandable that people makes wishes (or demands) that is the part of point of open source, and I don't expect everyone to just do PRs, it is a lot of work too. I'm aware that the product have its flaws, but it is also quite workable, which is why I'm not stressing out about adding new features.

InteXX commented 5 years ago

@sjkp

OK, sounds good.

xt0rted commented 5 years ago

When using the new Run From Package deployment model extension webjobs can't be saved to the site's App_Data folder since it's read-only. The App Insights extension now saves its webjob under D:\home\site\jobs\Continuous\ApplicationInsightsProfiler2 because of this.

If you set your site up using Run From Package and then install the Let's Encrypt extension the webjob is saved under D:\home\site\jobs\Continuous\letsencrypt(letsencrypt.siteextension.job) automatically.

The D:\home\site\jobs folder is the new location extensions should save their webjobs to so they're separate from the site content. If you change the installation path to point to this location then this issue should go away while also making it a bit easier for people who migrate to Run From Package (currently you need to uninstall and then reinstall the extension). https://github.com/sjkp/letsencrypt-siteextension/blob/94ec784e208e6e9e96cc878b4f43803b974f8331/LetsEncrypt-SiteExtension/install.cmd#L1

Update: to install the webjob to the new location the installation path would become:

%HOME%\site\jobs\Continuous\letsencrypt.siteextension.job
Tsaukpaetra commented 5 years ago

If you want to not have the job co-exists with your web app. You can always use the API provided by the siteextension that comes without web jobs, and then call that API from somewhere else, e.g. a azure function or another orchestration platform. I have a write up on my web site for that approach https://wp.sjkp.dk/lets-encrypt-on-azure-web-apps-using-a-function-app-for-automated-renewal/

I might just do that. Even though it's only five minutes per site every three months, saving the annoyance from having to do so would be great!

@Tsaukpaetra originally I didn't envision that people would constantly delete their web jobs, if I had known I would never have build it around web jobs. What you suggest is just a patch to what I believe is fundamentally a bad architecture, so it is not something that I'm going to spend time on implementing.

I would however love to build an easier installation process where the jobs run outside of the web app that they are renewing certificates for just haven't found the time for it. Maybe one day I will win the lottery and I can work on open source full time :)

Yes, I blame Microsoft for having you put the web jobs intermixed with site data in the first place. Not only that, but that the default deployment method naturally erases foreign files (justifiably) and so breaking this external add-on.

IMO external tooling should not have to rely on the internal state of the app it's operating outside of to function.

@xt0rted sounds like the right direction, is there a way to do that manually now? If I just move the folder using the console and restart the site, would that work?

xt0rted commented 5 years ago

@Tsaukpaetra I haven't tried moving this job manually but I see no reason why it wouldn't work.

Tsaukpaetra commented 5 years ago

Yes!

Command run in the console:

xcopy /s App_Data\jobs\continuous\letsencrypt(letsencrypt.siteextension.job)\*.* ..\jobs\continuous\letsencrypt.siteextension.job\

It looks like the site picked up the job files just fine, and now after deploying the default job has indeed gone missing while the other copy of it remains. 👍

ohadschn commented 5 years ago

Shameless plug for my take on running the renewal outside the target web app... https://github.com/ohadschn/letsencrypt-webapp-renewer

sjkp commented 5 years ago

@xt0rted wasn't aware of this new path, sounds a simple solution, maybe the extension just works without change I dont copy it myself to the AppData folder, the kudu installation of the site-extension does. Could be I have to configure something different, I will investigate.

Tsaukpaetra commented 5 years ago

Sometimes the simple solutions are the best! I'm really hoping the alternate path does the trick, so far everything seems normal after moving it.

InteXX commented 5 years ago

I'm not sure I'm understanding this correctly.

Does this mean that we can configure the extension to run correctly even after its files are deleted from the web app folders?

If so, I'm not following how to set it up.

Tsaukpaetra commented 5 years ago

Does this mean that we can configure the extension to run correctly even after its files are deleted from the web app folders?

Yes, this moves the job executables out of your local sites' webjobs so it doesn't get erased on publish.

If so, I'm not following how to set it up.

In your App Service in Azure, there's an option on the left near the bottom named "Console". If you click that, and then paste in the command I gave above, you should see the web job get copied, and that's basically it.

InteXX commented 5 years ago

Hm, I've got nothing in App_Data except some Elmah stuff:

image

I'm pretty certain I accidentally deleted the job during a recent WebDeploy action. I set about reinstalling the extension, but as I recall the installation procedure had become more complicated since my first round—I didn't get very far.

InteXX commented 5 years ago

OK, got it. I removed and reinstalled the extension to get the jobs folder back.

The Azure UI has changed since the documentation was written, and so we have to 'translate' things a little bit. That's what threw me off earlier.

This seems to have done the trick, good job. After I copied from and deleted App_Data\jobs I get this:

image

All appears to be well in certificate land.

ohadschn commented 5 years ago

Another way to make sure the webjob never gets deleted regardless of what you do to your web app is run it externally (not from within the web app whose cert is to be renewed). The wiki has an explanation of how to achieve it using Azure Functions, and I've written a WebJob which you install on an external, dedicated web app: https://github.com/ohadschn/letsencrypt-webapp-renewer

InteXX commented 5 years ago

@Tsaukpaetra

My webjobs have started failing, with this error message. Do you suppose it might be related to this change?

Tsaukpaetra commented 5 years ago

I find it more likely that the principal you're using no longer has the role required to access that resource group. Being in a different folder doesn't change a principal's permissions. Check that first.

InteXX commented 5 years ago

@Tsaukpaetra

Check that first

Sounds good. I've been digging around for a half hour, trying to figure out how to do that.

Do you have any leads?

InteXX commented 5 years ago

@Tsaukpaetra

Check that first

I've spent a full workday wading through documentation and blog posts, trying to figure out how to check on this. Another workday gone is looming on the horizon. I'm coming up empty.

Microsoft documentation is absolutely horrid. This seems like it should be a simple task, but it's buried under mounds of explanations of complex and unrelated rabbit holes that lead to nowhere.

If you were tasked with this, how would you do it in the portal?