spring-projects / spring-authorization-server

Spring Authorization Server
https://spring.io/projects/spring-authorization-server
Apache License 2.0
4.78k stars 1.25k forks source link

Consider accounting for clock skew for nbf claim #1631

Open arpadf78 opened 1 month ago

arpadf78 commented 1 month ago

In an authorization_code flow, sometimes the access_token we receive from the /token endpoint on node1 is not accepted by the /userinfo endpoint on node2.

We are using a distributed cache based OAuth2AuthorizationService and while the machines clocks are synchronized, sometimes a few milliseconds difference can create validation problems.

The code fails in line 58 of OidcUserInfoAuthenticationProvider - isActive()

public Authentication authenticate(Authentication authentication) throws AuthenticationException {
    ...
    if (!authorizedAccessToken.isActive()) {
        throw new OAuth2AuthenticationException("invalid_token");
    }
    ....
}

And the reason it fails is because the isBeforeUse() returns true, i.e the time on the machines is before the "nbf".

Solution: A precise NBF should never be compared directly with current time in distributed systems. There should be a MAX_TIME_DIFF to be taken into account, either at validation or (preferrably) at JWT creation. Is better to extract a MAX_TIME_DIFF~=2s from the actual time at token creation, because at JWT serialization we lose the accuracy of Instant anyways and only seconds are kept.

In the java-jwt library there is an acceptLeeway() method which "Defines the default window in seconds in which the Not Before, Issued At and Expires At Claims will still be valid." - this can be one strategy to validate such claims, but as mentioned a simpler, more visible, and easier for clients is to simply extract the TIME_DIFF from the notBefore claim. Both have the same goal to somehow take into account inherent clock skews between hosts in distributed solutions

loren-coding commented 1 month ago

I also encountered the same problem. #1346
Currently I can only adjust the microsecond difference in OAuth2TokenCustomizer

arpadf78 commented 1 month ago

@loren-coding Yep, same problem. I missed that report. But this is a bug, you cannot simply compare NBF in a distributed system. And server synchronization is only part of the solution. Virtual machines are known to have bigger/faster time drift.

jgrandja commented 1 month ago

@arpadf78 This issue has been reported only 2 times now and OAuth2Authorization.isActive() has been around for quite some time. So I'm not convinced this is a bug or it would have been reported much earlier. I still feel this is an environment related issue and the environment should be configured correctly.

Either way, I'm open to hearing a solution that you can propose. Do you have a specific enhancement in mind? If so, please provide details.

As a workaround for your envirinment, you can configure a custom OAuth2TokenCustomizer to change the nbf claim. The nbf claim defaults to iat, so you can change the nbf claim to be iat minus 1 second. That should solve the nansecond issue as long as the server times are truly in sync.

arpadf78 commented 1 month ago

"As a workaround for your envirinment, you can configure a custom OAuth2TokenCustomizer to change the nbf claim. The nbf claim defaults to iat, so you can change the nbf claim to be iat minus 1 second. That should solve the nansecond issue as long as the server times are truly in sync."

That is exactly what I did, I extracted a max-allowed-time-drift from the nbf claim by using the OAuth2TokenCustomizer.

Suggestion: Perform exactly the same operation when the NBF claim is created, by extracting a TokenSettings property from the IAT claim. Define a maxAllowedTimeDrift (Duration) in the TokenSettings with a default of 1 seconds. So this issue can be solved very easily with a simple configuration, by creating a TokenSettings bean (which probably everybody does anyway if they want to customize the accessToken, refreshToken time to live)

Regarding the severity of this issue. If your OS time drifts so much that the NBF validation fails, no user will we able to login until your servers nodes get in sync again. This could mean an outage ranging from few seconds to minutes.

jgrandja commented 1 month ago

If your OS time drifts so much that the NBF validation fails, no user will we able to login until your servers nodes get in sync again. This could mean an outage ranging from few seconds to minutes.

As mentioned, this is only the 2nd time this issue has been logged. I'm wondering what is different in your environment compared to others? I don't think you are the only one running replicas of Spring Authorization Server but yet no one else other than gh-1346 has reported this scenario.

I'm reluctant on adding tokenSettings.maxAllowedTimeDrift if it's only going to be used by a very small number of users.

That is exactly what I did, I extracted a max-allowed-time-drift from the nbf claim by using the OAuth2TokenCustomizer.

I'm assuming that customization worked for you?

arpadf78 commented 4 weeks ago

Yes, my problem is solved using the OAuth2TokenCustomizer, but I would argue - as already mentioned in the original post - simply comparing the NBF without considering for time drift is not safe. No matter what server cluster you have.

    @Bean
    public OAuth2TokenCustomizer<JwtEncodingContext> tokenCustomizer()
    {
        return (context) -> {
                // Adjust NBF to account for time drift between server nodes
                Instant iat = context.getClaims().build().getIssuedAt();
                Instant nbf = iat.minus(appProperties.getMaxAllowedTimeDrift());
                context.getClaims().notBefore(nbf);

These days everybody is using virtualization which can drift more easily. Even if they have stringent time synchronization set up, they risk not being able to login on the first network glitch they have.

Without such a compensation is just a matter of time... I for one, would not be comfortable knowing my solution depends on few milliseconds timing between independently running server nodes.

jgrandja commented 3 weeks ago

@arpadf78 I changed the title to reflect the enhancement request. Let's see how many other users need this before we consider adding it.