umbraco / Umbraco-CMS

Umbraco is a free and open source .NET content management system helping you deliver delightful digital experiences.
https://umbraco.com
MIT License
4.4k stars 2.66k forks source link

Backoffice anti-forgery token can't be decrypted after deployments #16107

Open enkelmedia opened 4 months ago

enkelmedia commented 4 months ago

Which Umbraco version are you using? (Please write the exact version, example: 10.1.0)

v11-v13 (latest)

Bug summary

We've noticed lately that there's some kind of issue with the backoffice anti-forgery token in some scenarios like app pool recycles and app pool switches causing the backoffice to not work and the server to return HTTP Status: 417 The anti-forgery token could not be decrypted.

This happens for us during deployments and has surfaced because customers have worked on a page that was loaded before our deployment. Then, due to the issue, they could not save their work. The only error indication is the X in the save button (no notification) so sometimes editors think that they have saved.

For the record: We do store the key ring on disk (using AddDataProtection()) and the issue does not affect anti-forgery tokens for the front end at all, only the backoffice. This also works for the Auth-cookie so we do not have to log in again in any of the scenarios.

On v12-v13 we consistently get "HTTP Status: 417 The anti-forgery token could not be decrypted" after app pool recycle on the production machines configured with default App Pool settings.

Specifics

The symptoms indicate that there is something special about how the key ring for the backoffice anti-forgery token is handled, almost as if it does not use the storage configured by ´AddDataProtection()` but creates something that seems to be dependent on the App Pool / App Pool Identity.

The anti forgery token on the front end works in all scenarios below, for both app pool recycle and app pool switches. I've also tested a vanilla MVC site with a form+anti forgery token without being able to reproduce the problem, this indicates that the backoffice token is handled in a different way causing the issue.

There are two types of actions that we've managed to find that would trigger this issue:

Also, to be clear this does not happen:

We've tested on some of our production environments to identify if there were some changes in the CMS that might give us insights. This is the results:

Machine Recycle App Pool Switch Umb-version Runtime App-Pool Identity
Windows Server 2019 #1 Works 9.3.1 6.0.4 ApplicationPoolIdentity
Windows Server 2022 #1 Works 10.3.2 6.0.10 ApplicationPoolIdentity
Windows Server 2022 #2 Works Error 11.2.1 7.0.11 ApplicationPoolIdentity
Windows Server 2019 #2 Works 11.2.2 7.0.14 ApplicationPoolIdentity
Windows 2019 #1 Works Error 11.3.1 7.0.5 ApplicationPoolIdentity
Windows 2019 #1 Works 12.0.1 7.0.5 ApplicationPoolIdentity
Windows Server 2022 #2 Error 12.3.5 7.0.11 ApplicationPoolIdentity
Windows Server 2022 #2 Error 12.3.5 7.0.11 ApplicationPoolIdentity
Windows Server 2022 #2 Error 13.11.1 8.0.4 ApplicationPoolIdentity
Windows Server 2019 #2 Error 13.2.2 8.0.0 ApplicationPoolIdentity

Noticing that recycle worked in our tests up until 12.3.5 which might indicate that changes introduced to facilitate the content delivery API impacted this?

The "app pool switch" seems to have been an issue at least back to v 11.2.1.

Here is a screen share demonstrating the issue:

umb13-recycle

Steps to reproduce

I've created a zip for the project that I use for testing: https://www.dropbox.com/scl/fi/bd6wcnm3v19v6f6js4b1v/share-Umb-417-issue-MyProject.zip?rlkey=u1e2ezbzgqsmar30tow1vhx5f&dl=0

Here are the basic steps:

1. Create a new Umbraco website

dotnet new install Umbraco.Templates::13.2.2 --force && dotnet new sln --name "MySolution" && dotnet new umbraco --force -n "MyProject" --friendly-name "Administrator" --email "admin@example.com" --password "1234567890" --development-database-type SQLite && dotnet sln add "MyProject" && dotnet add "MyProject" package clean && dotnet run --project "MyProject"

2. Build a standard "MVC" form using a surface controller

This is to test the difference between the front-end anti-forgery token and the one used in the backoffice.

3. Publish the project

In the same folder as the .csproj, publish the project

dotnet publish

4. Setup IIS

In IIS:

(Notice: In some scenarios, the recycle error is triggered when running the App Pool as ApplicationPoolIdentity and sometimes we've had to change the App Pool identity to a local user to trigger the problem).

5.1. Replicate "App Pool Recycle error"

5.2. Replicate "App Pool Switch error"

6 Front end

Perform step 5.1 and 5.2 with the form on the front-end to notice that the error is not present here.

Expected result / actual result

The backoffice anti-forgery token should be decrypted even after app pool recycle or app pool switch, without having to reload the backoffice page.

github-actions[bot] commented 4 months ago

Hi there @enkelmedia!

Firstly, a big thank you for raising this issue. Every piece of feedback we receive helps us to make Umbraco better.

We really appreciate your patience while we wait for our team to have a look at this but we wanted to let you know that we see this and share with you the plan for what comes next.

We wish we could work with everyone directly and assess your issue immediately but we're in the fortunate position of having lots of contributions to work with and only a few humans who are able to do it. We are making progress though and in the meantime, we will keep you in the loop and let you know when we have any questions.

Thanks, from your friendly Umbraco GitHub bot :robot: :slightly_smiling_face:

enkelmedia commented 4 months ago

Okey, I think I've figured out the problem.

The Umbraco.Cms.Web.BackOffice.Security.BackOfficeAntiforgery class contains some interesting hacks =D

A custom ServiceCollection is created to get around some issues with creating a custom AntiForgerty service, the fact that this DI container only exists inside this class means that any configuration made during startup will be ignored. This actually resonates with my initial findings that the backoffice anti forgery token does not use the key store configured with AddDataProtection.

I figured that we could probably pass on the global configuration into the class-isolated DI container to be able to respect the data protection configuration.

I've created a simple work around that can be applied to any site to replace the current implementation:

public class PatchedBackofficeAntiforgery : IBackOfficeAntiforgery
{
    private readonly IAntiforgery _internalAntiForgery;
    private readonly CookieBuilder _angularCookieBuilder;

    public PatchedBackofficeAntiforgery(
        IOptionsMonitor<GlobalSettings> globalSettings,
        ILoggerFactory loggerFactory,
        IServiceProvider serviceProvider)
    {
        CookieSecurePolicy cookieSecurePolicy = globalSettings.CurrentValue.UseHttps ? CookieSecurePolicy.Always : CookieSecurePolicy.SameAsRequest;

        // PATCH: Getting the IDataProtectionProvider from the "real" DI container to pass on to below.
        var dpProvider = serviceProvider.GetRequiredService<IDataProtectionProvider>();

        // NOTE: This is the only way to create a separate IAntiForgery service :(
        // Everything in netcore is internal. I have logged an issue here https://github.com/dotnet/aspnetcore/issues/22217
        // but it will not be handled so we have to revert to this.
        _internalAntiForgery = new ServiceCollection()
            .AddSingleton(loggerFactory)
            .AddSingleton<IDataProtectionProvider>((f) => dpProvider)
            .AddAntiforgery(x =>
            {
                x.HeaderName = Constants.Web.AngularHeadername;
                x.Cookie.Name = Constants.Web.CsrfValidationCookieName;
                x.Cookie.SecurePolicy = cookieSecurePolicy;
            })
            .BuildServiceProvider()
            .GetRequiredService<IAntiforgery>();

        // Configure cookie builder using defaults from antiforgery options
        _angularCookieBuilder = new AntiforgeryOptions().Cookie;
        _angularCookieBuilder.HttpOnly = false; // Needs to be accessed from JavaScript
        _angularCookieBuilder.SecurePolicy = cookieSecurePolicy;
    }

    /// <inheritdoc />
    public async Task<Attempt<string?>> ValidateRequestAsync(HttpContext httpContext)
    {
        try
        {
            await _internalAntiForgery.ValidateRequestAsync(httpContext);
            return Attempt<string?>.Succeed();
        }
        catch (Exception ex)
        {
            return Attempt.Fail(ex.Message);
        }
    }

    /// <inheritdoc />
    public void GetAndStoreTokens(HttpContext httpContext)
    {
        AntiforgeryTokenSet set = _internalAntiForgery.GetAndStoreTokens(httpContext);

        if (set.RequestToken == null)
        {
            throw new InvalidOperationException("Could not resolve a request token.");
        }

        // We need to set 2 cookies:
        // The cookie value that angular will use to set a header value on each request - we need to manually set this here
        // The validation cookie value generated by the anti-forgery helper that we validate the header token against - set above in GetAndStoreTokens
        httpContext.Response.Cookies.Append(Constants.Web.AngularCookieName, set.RequestToken, _angularCookieBuilder.Build(httpContext));
    }
}

Then during service registration, after AddUmbraco():

services.AddUnique<IBackOfficeAntiforgery,PatchedBackofficeAntiforgery>();

This implementation will have to add a new ctor (to get the "real" DI container) so I guess it would be considered a breaking change, maybe if someone from HQ want to have a look at my proposed implementation (that solves the problems for us) and let me know if you would like to have a PR =D

Migaroez commented 4 months ago

Hey Markus, thank you so much for the detailed report #h5yr!

This is going to take a bit more work to verify and investigate, but at first glance it indeed seems wrong that we do not supply a way to configure DataProtection on the isolated servicescollection for the backoffice AF.

Since the backoffice AF is clearly separated from the normal AF, it would make sense to supply a mechanism where you could do services.AddBackofficeDataProtection() so you would have full control over the isolated AF like you do with with the global one instead of us just copying over the same settings by depending on the (globally) configured IDataProtectionProvider.

I will have a chat with the team and hopefully get back to you on this next week.

enkelmedia commented 4 months ago

Hi!

Thanks for looking at this!

I've been running my workaround in production for a while and it works fine, no problems since that update.

Let's establish a common "vocabulary" for the different parts here =D

In the current state of the backoffice, the backoffice auth cookies are encrypted using the global configuration and only the anti forgery token uses this local IOC that basically ignores any configuration made during startup.

Personally I would just find it confusing to have to configure data protection two times. I'm guessing that the vast majority of sites would use the same data protection configuration for the frontend and the backoffice. However if it would be implemented in a way so that it reuses the "global configuration" but also make it possible to set backoffice-specific settings if needed that sounds like a good solution to me.

There are plenty of examples out there about how to configure data protection so it would probably be good one could just follow them to "make it work" and then if someone needs to fiddle with different settings for the backoffice they could dive into the Umbraco docs to find out how.

Cheers!

robertjf commented 3 months ago

We are seeing this issue in Umbraco 13.3.2 and I've applied the above patch like so:

var umbracoBuilder = builder.CreateUmbracoBuilder()
    .AddBackOffice()
    .AddWebsite()
    .AddDeliveryApi()
    .AddComposers();

umbracoBuilder.Services.AddUnique<IBackOfficeAntiforgery, PatchedBackofficeAntiforgery>();

umbracoBuilder.Build();

I can get a breakpoint in the PatchedBackofficeAntiforgery code and it's being hit, but I'm still getting the following in the logs:

[11:15:12 WRN] Error unprotecting the session cookie.
System.Security.Cryptography.CryptographicException: The payload was invalid. For more information go to https://aka.ms/aspnet/dataprotectionwarning
   at Microsoft.AspNetCore.DataProtection.Managed.ManagedAuthenticatedEncryptor.Decrypt(ArraySegment`1 protectedPayload, ArraySegment`1 additionalAuthenticatedData)
   at Microsoft.AspNetCore.DataProtection.KeyManagement.KeyRingBasedDataProtector.UnprotectCore(Byte[] protectedData, Boolean allowOperationsOnRevokedKeys, UnprotectStatus& status)
   at Microsoft.AspNetCore.DataProtection.KeyManagement.KeyRingBasedDataProtector.Unprotect(Byte[] protectedData)
   at Microsoft.AspNetCore.Session.CookieProtection.Unprotect(IDataProtector protector, String protectedText, ILogger logger)

Even if I inject the new service further up the chain (directly after AddBackOffice()) it doesn't resolve the issue.

enkelmedia commented 3 months ago

@robertjf I'm not sure that my hack fixes all kinds of scenarios where this is an issue. It worked for us in our very specific scenario.

Please not that you would probably have to delete any existing cookies for this to work, did you try that?

ctolkien commented 1 month ago

Noting that we are seeing this when storing our data protection keys in blob storage, protected via keyvault (we're running in containers).

seanrockster commented 6 days ago

How does this apply to front-end (member) login? Not to concerned about backoffice logins however all our customers get logged out on slot swap.

enkelmedia commented 6 days ago

@seanrockster the exact issue that I have should not impact the members on the front end. From your description, it sounds more like you are getting new keys for the data protection on each new release?

Some more info here: https://learn.microsoft.com/en-us/aspnet/core/security/data-protection/configuration/overview?view=aspnetcore-8.0

How are you storing the keys?

seanrockster commented 6 days ago

@enkelmedia Hi, we're not doing anything specific. We haven't configured any data protection other than what comes out of the box with umbraco.

enkelmedia commented 6 days ago

@seanrockster The data protection stuff is not a ”Umbraco-thing”, its a .NET Core concept.

The keys used to encrypt/decrypt the cookies will be stored on the local machine by default, when you do a slot wrap there is a new machine, hence new keys that can’t decrypt your members cookies.

You need to persist the keys somewhere so that the new machine can use the same keys.

Have a look at the link from my previous answer.

Cheers!