Closed martynas-openg closed 2 years ago
The same issue with verification tokens. They are being deleted by db cascades on identity updates:
So, verification link sent by post-registration hook will not work if identity have been updated before user clicks the link.
From my understanding, that’s what’s happening: A RecoveryToken references a RecoveryAddress which in turn references an Identity. On the database level, all the references have a ON DELETE CASCADE
. As pointed out already by Sergey, when an Identity is updated, all of its recovery addresses are deleted, and thus also all its recovery tokens (which are used in link recoveries).
My suggestion for fixing this would be to make UpdateIdentity() a bit more sophisticated. Instead of deleting and re-creating all of the Identity’s recovery addresses, perform a diff between the current and desired set of recovery addresses so you know which ones you can ignore because they have not changed, which ones to add, and which ones to delete.
What do you think, is this a viable solution?
Yeah I think the only option is to do a diff comparison or be explicit about changing the recovery / verification addresses in a different API or through the current update
API body / parameters.
I guess sometimes you wouldn't really want to change the Identity recovery / verification addresses at all, but just a trait, such as name or language etc. And if you do change the Identity recovery / verification addresses you might only change that?
I'm just thinking a bit about performance implications when we do a diff check on the whole identity every time it is updated. What do you think?
Performing a diff will definitely have a performance penalty as we need to fetch all the associated entities from the database, and we need to do that while holding a row lock on the Identity to make sure that the Identity and its recovery addresses, etc. don’t get updated while we determine the difference. I can’t tell if Kratos can afford that performance hit or not, but in general I would say that correctness is more important than performance.
Ideally, the Pop framework would support eager updating and do all the heavy lifting, but unfortunately it does not (yet).
With that being said, I however think that the actual problem lies somewhere else and that this issue just surfaced it. The Identity is an aggregate as per DDD and it is comprised of the Credentials, VerifiableAddress, and RecoveryAddress entities. In an aggregate, other code should not reference entities other than the root directly via their IDs — it should always go via the root, i.e. the Identity, and some stable identifier. We see this being violated in Kratos with the RecoveryToken directly referencing the RecoveryAddress UUID — and when the UUID changes, it breaks. But resolving this would be a bigger task I think.
Hi,
I agree that if the recovery tokens remain tied to the recovery address, then the address should not be recreated every time the identity is updated. Maybe this can be done with an UPSERT
, returning the IDs of the recovery addresses, and then a batch DELETE
for all IDs that are not in this set (scoped to the identitiy ID)?
However, I think we can sidestep this issue by just breaking the association in RecoveryToken
to RecoveryAddress
https://github.com/ory/kratos/blob/db00e85e65c31b2bc497f0f4b4a28684b9f8bb9a/selfservice/strategy/link/token_recovery.go#L34
to just be a UUID. In UseRecoveryToken
we check that there is a recovery address for the token anyways (and this is the correct place and time to check, anyways):
This way, we have the semantics intact: The recovery token just proves ownership of a recovery address, and when you use it (once!), this email address is looked up and the flow continues.
Tangentially, regarding the token itself, I found it surprising that the token did not embed the information about the email address. I think of the recovery token as a proof that you own an email address, so the token should be a HMAC(<email address>, secret)
, when in Kratos it is HMAC(<random>, secret)
. In this case I wonder why not just write the <random>
to the db.
We just had this problem in prod where a customer was performing an update step for the user through the admin api, and then the verification token was invalidated due to this bug.
@hperl do you have some time to look into this issue and potentially fix this?
fyi @Benehiko @zepatrik
Preflight checklist
Describe the bug
Hey,
I encountered a problem: when an identity gets updated (via admin endpoint) during account recovery, the recovery flow get invalidated. Is this working as expected?
Reproducing the bug
Relevant log output
No response
Relevant configuration
Version
v0.9.0-alpha.3
On which operating system are you observing this issue?
Linux
In which environment are you deploying?
Docker
Additional Context
If there is no identity update via admin endpoint between the recovery link getting sent out and clicked, the whole recovery and password reset flow works correctly.
If this is a problem because the admin endpoint updates the whole identity, maybe the recovery link could stay valid if there are no changes in the update.