serlo / infrastructure-modules-shared

Apache License 2.0
4 stars 1 forks source link

Kratos webhook fails a lot #43

Closed hugotiburtino closed 1 year ago

hugotiburtino commented 1 year ago

After upgrading our kratos image on 9 April to the commit 6d83dc98 of kratos, the registration webhook almost never worked. It means, many users didn't get the legacy_id. On 23 April we updated it to the oficial image v0.13.0, it works most of the time but fails sometimes. It is a clear contrast to registrations before 9 April, before it NEVER failed.

Total users affected so far: 122

The failures from 9 to 23 April are probably due to https://github.com/ory/kratos/pull/3111 The ones from 23 until 26 April are due to https://github.com/ory/kratos/pull/3200, which introduced a timeout.

hugotiburtino commented 1 year ago

Issue closed just because we solved the broken accounts for now, but this is not a final solution, since it'll likely occurs again

hugotiburtino commented 1 year ago

After giving more resources to kratos, it made it fail a lot again (!). For example: on 03 May 14 failures out of 20 attempts. We need some measures:

In the long run, we have to deprecate the legacy user table

hugotiburtino commented 1 year ago

There is a memory leak in the newest version of kratos but, after giving more power, the webhook stopped to fail.

hugotiburtino commented 7 months ago

Correction here: the webhook has failed again, but less often