Closed rtyley closed 6 months ago
To be clear about the cause of the issue. Was the CSRFTokenSigner
failing to validate old keys or rotate them to new ones or both?
To be clear about the cause of the issue. Was the
CSRFTokenSigner
failing to validate old keys or rotate them to new ones or both?
The second one - CSRFTokenSigner
was never updating the secret it was using to sign or verify tokens. The secret it used was entirely dependent on when that EC2 instance started up.
Investigating https://github.com/guardian/ophan/issues/5970, which was seeing many of these errors:
...it's become clear that
play-secret-rotation
, which overrides the crucialRequestFactory
component to adopt secret-rotating behaviour withRotatingSecretComponents
:https://github.com/guardian/play-secret-rotation/blob/4c96b9294e30f231713a79a0cb449668c81c7eba/play/play-v27/RotatingSecretComponents.scala#L22-L23
...doesn't also override
CSRFTokenSigner
.Consequently, if app servers in a cluster have started at different times, before and after a secret rotation, as in this case~annotations~(vertical~(~(label~'Secret20rotation~value~'2024-03-20T143a243a32.000Z)~(color~'23d62728~label~'Secret20rotation~value~'2024-03-20T203a243a32.000Z)~(color~'23d62728~label~'Secret20rotation~value~'2024-03-20T083a243a32.000Z))))&query=~'7bAWS2fEC22cInstanceId7d20i-0dac56c6af2dbfdae) with the Ophan Dashboard:
...then different servers within a Play app server cluster will be using a different secret for signing the CSRF token - and consequently reject each others tokens. If a user is making a form-POST-ing request, it's a random chance as to whether their POST will be received by a Play server instance using the same secret as the form was created with - a random chance whether it will fail.