ory / kratos

The most scalable and customizable identity server on the market. Replace your Homegrown, Auth0, Okta, Firebase with better UX and DX. Has all the tablestakes: Passkeys, Social Sign In, Multi-Factor Auth, SMS, SAML, TOTP, and more. Written in Go, cloud native, headless, API-first. Available as a service on Ory Network and for self-hosters.
https://www.ory.sh/?utm_source=github&utm_medium=banner&utm_campaign=kratos
Apache License 2.0
11.24k stars 963 forks source link

Admin API identity update invalidates recovery link #2433

Closed martynas-openg closed 2 years ago

martynas-openg commented 2 years ago

Preflight checklist

Describe the bug

Hey,

I encountered a problem: when an identity gets updated (via admin endpoint) during account recovery, the recovery flow get invalidated. Is this working as expected?

Reproducing the bug

  1. Create an identity via admin endpoint
  2. Initiate self service recovery flow so an email with recovery link gets sent out
  3. Update same identity via admin endpoint with no changes (same traits, state == active, same schema)
  4. Follow the recovery link received in email
  5. Get a flow response that the recovery token is invalid

Relevant log output

No response

Relevant configuration

identity schema:

{
  "$id": "identity0",
  "title": "Test",
  "type": "object",
  "properties": {
    "traits": {
      "type": "object",
      "properties": {
        "email": {
          "type": "string",
          "format": "email",
          "title": "E-Mail",
          "ory.sh/kratos": {
            "credentials": {
              "password": {
                "identifier": true
              }
            },
            "recovery": {
              "via": "email"
            },
            "verification": {
              "via": "email"
            }
          }
        }
      },
      "required": [
        "email"
      ],
      "additionalProperties": false
    }
  }
}

Version

v0.9.0-alpha.3

On which operating system are you observing this issue?

Linux

In which environment are you deploying?

Docker

Additional Context

If there is no identity update via admin endpoint between the recovery link getting sent out and clicked, the whole recovery and password reset flow works correctly.

If this is a problem because the admin endpoint updates the whole identity, maybe the recovery link could stay valid if there are no changes in the update.

splaunov commented 2 years ago

The same issue with verification tokens. They are being deleted by db cascades on identity updates:

https://github.com/ory/kratos/blob/731b3c7ba48271e2fb6bbd53b0281d5269012332/persistence/sql/persister_identity.go#L328

So, verification link sent by post-registration hook will not work if identity have been updated before user clicks the link.

thcyron commented 2 years ago

From my understanding, that’s what’s happening: A RecoveryToken references a RecoveryAddress which in turn references an Identity. On the database level, all the references have a ON DELETE CASCADE. As pointed out already by Sergey, when an Identity is updated, all of its recovery addresses are deleted, and thus also all its recovery tokens (which are used in link recoveries).

My suggestion for fixing this would be to make UpdateIdentity() a bit more sophisticated. Instead of deleting and re-creating all of the Identity’s recovery addresses, perform a diff between the current and desired set of recovery addresses so you know which ones you can ignore because they have not changed, which ones to add, and which ones to delete.

What do you think, is this a viable solution?

Benehiko commented 2 years ago

Yeah I think the only option is to do a diff comparison or be explicit about changing the recovery / verification addresses in a different API or through the current update API body / parameters.

I guess sometimes you wouldn't really want to change the Identity recovery / verification addresses at all, but just a trait, such as name or language etc. And if you do change the Identity recovery / verification addresses you might only change that?

I'm just thinking a bit about performance implications when we do a diff check on the whole identity every time it is updated. What do you think?

thcyron commented 2 years ago

Performing a diff will definitely have a performance penalty as we need to fetch all the associated entities from the database, and we need to do that while holding a row lock on the Identity to make sure that the Identity and its recovery addresses, etc. don’t get updated while we determine the difference. I can’t tell if Kratos can afford that performance hit or not, but in general I would say that correctness is more important than performance.

Ideally, the Pop framework would support eager updating and do all the heavy lifting, but unfortunately it does not (yet).

With that being said, I however think that the actual problem lies somewhere else and that this issue just surfaced it. The Identity is an aggregate as per DDD and it is comprised of the Credentials, VerifiableAddress, and RecoveryAddress entities. In an aggregate, other code should not reference entities other than the root directly via their IDs — it should always go via the root, i.e. the Identity, and some stable identifier. We see this being violated in Kratos with the RecoveryToken directly referencing the RecoveryAddress UUID — and when the UUID changes, it breaks. But resolving this would be a bigger task I think.

hperl commented 2 years ago

Hi,

I agree that if the recovery tokens remain tied to the recovery address, then the address should not be recreated every time the identity is updated. Maybe this can be done with an UPSERT, returning the IDs of the recovery addresses, and then a batch DELETE for all IDs that are not in this set (scoped to the identitiy ID)?

However, I think we can sidestep this issue by just breaking the association in RecoveryToken to RecoveryAddress https://github.com/ory/kratos/blob/db00e85e65c31b2bc497f0f4b4a28684b9f8bb9a/selfservice/strategy/link/token_recovery.go#L34 to just be a UUID. In UseRecoveryToken we check that there is a recovery address for the token anyways (and this is the correct place and time to check, anyways):

https://github.com/ory/kratos/blob/29d6376e22e4de617ec63ca0a5dcb4dbf34c7c37/persistence/sql/persister_recovery.go#L90-L96

This way, we have the semantics intact: The recovery token just proves ownership of a recovery address, and when you use it (once!), this email address is looked up and the flow continues.

Tangentially, regarding the token itself, I found it surprising that the token did not embed the information about the email address. I think of the recovery token as a proof that you own an email address, so the token should be a HMAC(<email address>, secret), when in Kratos it is HMAC(<random>, secret). In this case I wonder why not just write the <random> to the db.

aeneasr commented 2 years ago

We just had this problem in prod where a customer was performing an update step for the user through the admin api, and then the verification token was invalidated due to this bug.

@hperl do you have some time to look into this issue and potentially fix this?

fyi @Benehiko @zepatrik