Misuse resistant API - Githubissues

lcobucci / jwt

A simple library to work with JSON Web Token and JSON Web Signature

https://lcobucci-jwt.readthedocs.io/en/stable/

BSD 3-Clause "New" or "Revised" License

7.22k stars 593 forks source link

Misuse resistant API #663

Closed Slamdunk closed 2 years ago

Slamdunk commented 3 years ago

Hi, one thing I like very much about PASETO is it's impossible to use it in insecure ways. I'm not talking about JWT RFC flaws, but rather about API that we can and should implement to force a safe usage.

Here are two examples of lcobucci/jwt:v4 misuse:

/**
 * 1) Validation missing
 */
$jwtToken  = (new JwtParser(new JoseEncoder()))->parse($_POST['jwt']);
loginUser($jwtToken->claims()->get(RegisteredClaims::AUDIENCE));

/**
 * 2) Validation typo
 */
$jwtToken  = (new JwtParser(new JoseEncoder()))->parse($_POST['jwt']);
// It should be assert()
(new Validator())->validate(
    $jwtToken,
    new SignedWith(new Eddsa(), InMemory::base64Encoded('foo'))
);
loginUser($jwtToken->claims()->get(RegisteredClaims::AUDIENCE));

For v5 I propose these main BC-breaks:

API MUST require and consume a SignedWith validator in order to parse a JWT Token
API MUST NOT allow claims part to be read nor consumed before the parse call
API MUST allow headers part to be read and consumed before the parse call
API SHOULD accept additional validators to be consumed on parse call
(bonus) API MUST require and consume either a LooseValidAt or StrictValidAt validator in order to parse a JWT Token

lcobucci commented 3 years ago

I see where you are coming from but I really don't want to couple parsing to validation on the low level API.

We can create a high level API that handles both processes (there's already an issue about that), which won't be a BC-break.

What do you think?

Slamdunk commented 3 years ago

I'd like to hear what are the experiences and reasoning that push you think like that.

I think the exact opposite.

Let's have a few examples on other topics:

Imagine a VPN that first lets you connect to the bank intranet, and only then validates your certificate
Imagine a ATM that first lets you withdraw your money, and only then verifies your PIN
Imagine a police officer that first brings back your kids to "you", and only then checks your ID

Under no circumstances parsing and validation should not be coupled, in my humble opinion, neither in high-level API nor in low-level one.

lcobucci commented 3 years ago

My opinion is that JWTs don't always have to be validated on the application level.

I've implemented some solutions where the validation was handled in a different layer (load balancer) exactly to offload costly tasks (like signature verification) - so, yes the validation was always performed before the usage of the token.

I want to provide flexibility for users to decide in which level they want to apply constraints depending on their needs.

Does that make sense to you?

lcobucci commented 3 years ago

I do understand that people may misuse the tool because of all the knobs and levers. However, that's a trade-off I'm willing to accept and try to mitigate with putting efforts in proving information instead of making all the decisions by people.

SvenRtbg commented 3 years ago

I'm wondering how your point 3 should happen:

API MUST allow headers part to be read and consumed before the parse call

I'd say without parsing you only get base64 gibberish.

Also the problem with validation is that a JWT might expire, but you can still want to have access to the claims that were once valid, and probably still are. Allowing to pass an arbitrary timestamp into the validation would be the workaround.

So there is a use case to access the claims without validating every claim, especially the time related ones, every time. I would allow the counter-argument that in such a case, retrieving the claims when they are valid and storing them elsewhere probably is a better choice then. :) And I would support the notion that PASETO probably is the better alternative to JWT - but it has no final RFC, and is not used by the interfaces I have to deal with.

When you think of OpenIdConnect tokens: It's a mess. It requires validation on top of the basic JWT validation that this library does, having ambiguos edge cases and complex validation rules

Slamdunk commented 3 years ago

To both, there is no obstacle in this proposal to the real world example you provided:

$signedWithValidator = new class implements SignedWith {
    public function assert(Token $token): void { }
};
$validAtValidator = new class implements ValidAt {
    public function assert(Token $token): void { }
};
$offloadedValidJwt = (new Parser)->parse(
    $_POST['jwt'],
    $signedWithValidator,
    $validAtValidator
);

End user will always have enough flexibility to shoot himself in the feet, but the difference is substantial to me: here the code is crystal clear that the developer knows he's doing something nasty.

I consider myself not a newbie in coding field, and still I've done both the validation misuse I exampled above and https://github.com/lcobucci/jwt/pull/351

lcobucci commented 3 years ago

is crystal clear that the developer knows he's doing something nasty.

That's exactly where the assumption goes wrong, though. There's nothing nasty (IMHO) in getting an instance of a token without validating.

You're assuming that the ONLY use case of a JWT is to authorise something but a JWT is just a token. It can have any purpose, according to the wishes of the users. e.g.: you may pass a JWT to a background worker to identify (not authorise) a resource owner and given the async nature of that you can't always validate the date claims.

Forcing people to follow a specific flow (or making them do extra "nasty" work all the time) hinders the flexibility of using tokens and, to me, creates more problems than solutions.

I'm totally in favour of offering a general case parsing solution without changing the building blocks of the API, though.

Slamdunk commented 3 years ago

You're assuming that the ONLY use case of a JWT is to authorise something but a JWT is just a token.

Here I'm not considering the purpose of the token. The crucial point is that you should not consume claims that you didn't already validated with corresponding signature.

I think there are only 3 legit use cases:

A service carries the token between two services: it doesn't need to parse it
A service consumes the token: it MUST validate it in order to consume the claims as RFC7797#section-7.2 states:

If any of the listed steps fail, then the JWT MUST be rejected -- that is, treated by the application as an invalid input.
A service routes the token to another service, based on the properties of the token: routing rules should be applied based only on the headers, see RFC7519#section-5.3

There is no circumstance, as far as I know, where you have no choice but consuming invalid claims.

It can have any purpose, according to the wishes of the users. e.g.: you may pass a JWT to a background worker to identify (not authorise) a resource owner and given the async nature of that you can't always validate the date claims.

I'm sorry I don't understand: how do you identify the resource owner?

If by the token as a whole, you don't need to parse it.
If by the token claims, then you must validate them, otherwise why bothering signing them in the first place?
If by the token headers, that's the legit use case this proposal allows.
If the token is expired, I expect it to be rejected, otherwise why bothering issuing it with time constraints?

Forcing people to follow a specific flow (or making them do extra "nasty" work all the time) hinders the flexibility of using tokens and, to me, creates more problems than solutions.

As somehow emerged in the previous phrases, my question is: what is the exact legit flexibility that users may need that allows invalid claims to be consumed?

Post scriptum: the brief code I wrote above is bad in fact and I regret it already. To me nasty operations should not be allowed neither: if I'd like to offload token verification, the load balancer (in this case) should issue a new (identical) token with the None signer (alongside the original one for, for ex., logging purposes). Explicitness is crucial for secure operations: if some day the offload token verification fails, in your scenario the final app still logs the user in! Remember that the token offloading failure may be because you removed the load balancer and forgot to update the app code to validate the token again.

lcobucci commented 3 years ago

I think there are only 3 legit use cases:

A service carries the token between two services: it doesn't need to parse it

A service consumes the token: it MUST validate it in order to consume the claims as RFC7797#section-7.2 states:

If any of the listed steps fail, then the JWT MUST be rejected -- that is, treated by the application as an invalid input.

A service routes the token to another service, based on the properties of the token: routing rules should be applied based only on the headers, see RFC7519#section-5.3

I think you might be adding meaning to the RFC that isn't stated there.

Let's analyse the full 7.2 section:

When validating a JWT, the following steps are performed. The order of the steps is not significant in cases where there are no dependencies between the inputs and outputs of the steps. If any of the listed steps fail, then the JWT MUST be rejected -- that is, treated by the application as an invalid input.

Verify that the JWT contains at least one period ('.') character.

Let the Encoded JOSE Header be the portion of the JWT before the first period ('.') character.

Base64url decode the Encoded JOSE Header following the restriction that no line breaks, whitespace, or other additional characters have been used.

Verify that the resulting octet sequence is a UTF-8-encoded representation of a completely valid JSON object conforming to RFC 7159 [RFC7159]; let the JOSE Header be this JSON object.

Verify that the resulting JOSE Header includes only parameters and values whose syntax and semantics are both understood and supported or that are specified as being ignored when not understood.

Determine whether the JWT is a JWS or a JWE using any of the methods described in Section 9 of [JWE].

Depending upon whether the JWT is a JWS or JWE, there are two cases:

If the JWT is a JWS, follow the steps specified in [JWS] for validating a JWS. Let the Message be the result of base64url decoding the JWS Payload.

Else, if the JWT is a JWE, follow the steps specified in [JWE] for validating a JWE. Let the Message be the resulting plaintext.

If the JOSE Header contains a "cty" (content type) value of "JWT", then the Message is a JWT that was the subject of nested signing or encryption operations. In this case, return to Step 1, using the Message as the JWT.

Otherwise, base64url decode the Message following the restriction that no line breaks, whitespace, or other additional characters have been used.

Verify that the resulting octet sequence is a UTF-8-encoded representation of a completely valid JSON object conforming to RFC 7159 [RFC7159]; let the JWT Claims Set be this JSON object.

Finally, note that it is an application decision which algorithms may be used in a given context. Even if a JWT can be successfully validated, unless the algorithms used in the JWT are acceptable to the application, it SHOULD reject the JWT.

The validation that RFC covers is about structure and not content (apart from JWS/JWE/nested tokens distinction). There's nothing stating about claims validation because (IMHO) that's up for application.

The 5.3 section is about header replication, using encrypted tokens as use case, and doesn't state anything about claim validation as well.

To me nasty operations should not be allowed neither: if I'd like to offload token verification, the load balancer (in this case) should issue a new (identical) token with the None signer (alongside the original one for, for ex., logging purposes). Explicitness is crucial for secure operations: if some day the offload token verification fails, in your scenario the final app still logs the user in!

I completely agree on explicitness point but I don't believe the design of the solution should be driven by the choices around token formats (and the libraries that parse them).

Remember that the token offloading failure may be because you removed the load balancer and forgot to update the app code to validate the token again.

Yes, and if you do remove the load balancer, your application would then accept any token using None as algorithm - but all other properly signed tokens would fail.

What I'm trying to explain is that, depending on the context, different constraints are applicable for validation and there's no need for enforcing constraints which are not required. So, it's not really black and white.

Preventing people from shooting at their own feet is definitely important but we can never underestimate people's creativity on overcoming the most misuse resistant APIs.

I never used PASETO but it does seem that, apart from signature validation, validation constraints are pretty optional.

Slamdunk commented 3 years ago

I'm sorry if I've not made myself clear enough. It's easy to mess concepts around when they overlap, like the "validity" one.

As written in the topic, here I propose to make mandatory only these two:

Signature validity, to ensure integrity, authenticity and, for asymmetric keys, non-repudiation
Time validity, to avoid reply attacks (I know time claims are not mandatory by RFC, hence our LooseValidAt must be accepted as valid constraint, better than nothing)

All other constraints remain optional and can be consumed after parsing.

These two constraints are the foundations of every secure protocol and are enforced everywhere, even beyond infosec: your ID has a gov sign and a date too.

I hope that this point is now clear, and I also hope we agree that, for this very specific point, we should strive to be black or white.

The 5.3 section is about header replication, using encrypted tokens as use case, and doesn't state anything about claim validation as well.

Section 5.3 is an example of how conceptually different are headers and claims parts: headers are ok to be used before signature and time validation, claims should not.

Yes, and if you do remove the load balancer, your application would then accept any token using None as algorithm - but all other properly signed tokens would fail.

Of course using the None alg is a bad security practice, my example aimed only at avoiding misuse in the real case you provided, not insecure practices. Let me explain better:

in the first case, LB validates signature and times of the token; the same token is then passed to the app, which doesn't preform the two valuations above, but only other claims ones. The LB suddenly breaks: the app continues to work unaware the token may be invalid. Hackers have plenty of time to figure the beach out and exploit it
in the second case, the LB validates signature and times of the token, then creates a new one signed with None alg. The app verify the None alg aignature before consuming other claims. The LB breaks: the app immediately breaks too

In both cases once the LB breaks the app is completely vulnerable, but in the second one you are more likely to notice the failure.

Would I ever use the None alg for this reason? Hell no, I'd prefer a fast symmetric re-signing for the internal consumption over a missing signature. But we can agree that the second example where a signature is mandatory, even a null one, is more resistant to misuses than the first one.

Ocramius commented 3 years ago

May I suggest (future iteration):

/** @psalm-immutable */
interface RawToken { /* ... */ }

/** @psalm-immutable */
interface ParsedToken { /* ... */ }

/** @psalm-immutable */
interface ValidatedToken extends ParsedToken { /* ... */ }

/** @psalm-immutable */
interface Parser { 
    /** @throws FailedToParseToken */
    function __invoke(RawToken $t) : ParsedToken;
}

interface Validator { 
    /** @throws FailedToValidateToken */
    function __invoke(ParsedToken $t) : ValidatedToken;
}

ValidatedToken needs to be past tense, because validation happened at a certain point in time, and the token may not be valid after that. Perhaps it can expose the time of validation: also note that Validator is not pure, since it uses contextual data.

Anyway, just my 2 cents about possible future iterations: having an API consume a ValidatedToken requires going through both Validator and Parser by design. While it is still possible to interact with a ParsedToken, that is fundamentally just a parent type of ValidatedToken.

lcobucci commented 3 years ago

Section 5.3 is an example of how conceptually different are headers and claims parts: headers are ok to be used before signature and time validation, claims should not.

My interpretation here is completely different. We replicate claims on headers simply because when handling nested/encrypted tokens we have access to the headers but not to the claims. Is a faster way to reject something without having to decrypt the payload. But I might be wrong, who knows? :smile:

Shall we leave those examples and RFC references aside? I think they're being more noise than anything HAHAH

As written in the topic, here I propose to make mandatory only these two:

Signature validity, to ensure integrity, authenticity and, for asymmetric keys, non-repudiation

Time validity, to avoid reply attacks (I know time claims are not mandatory by RFC, hence our LooseValidAt must be accepted as valid constraint, better than nothing)

All other constraints remain optional and can be consumed after parsing.

These two constraints are the foundations of every secure protocol and are enforced everywhere, even beyond infosec: your ID has a gov sign and a date too.

I hope that this point is now clear, and I also hope we agree that, for this very specific point, we should strive to be black or white.

By making it impossible to parse a token without having the signature verified and the time claims validated we pull the security responsibility from users to the library - using types to make things more explicitly and allow people to define expectations regarding things is quite clever and neat, btw.

However, that doesn't really solve anything since the token headers and claims are simply json + base64url encoded. Regardless of what we do, we're simply powerless on userland code. If folks really want, they will simply get the decoder (best case scenario btw), invoke functions themselves on each part the JWT, and do whatever they want with the decoded information.

It may also lead to people re-issuing tokens to remove time tokens (resigning them, obviously) without really understanding the why.

The message I'm trying to convey is that I don't think we want to claim the responsibility of making other people's code secure by default. My initial approach with the design of the lib was to say: here are the building blocks, now it's up to you to design and secure your protocol.

I'm definitely open to discussing improvements but what I think we'll have with this approach is security by obfuscation - which isn't secure at all.

Slamdunk commented 3 years ago

I think you are merging together two very distinctive things: insecurity by purpose & insecurity by accident.

Of course we will never be able to deny user insecurity by purpose, and we shouldn't care at all. The sole target here is to avoid insecurity by accident, which has nothing to do with security by obfuscation.

Imagine to create a new tool and the team brainstorming goes as: "Hey guys, it turns out we can't make this 100% safe. But we can choose to make this at least safe from human errors." and another one "Naaah, blowing everything up by accident? That sounds a cool feature, let's allow it!"

The message I'm trying to convey is that I don't think we want to claim the responsibility of making other people's code secure by default.

Here I disagree: that's exactly what I want, and what I expect from everything I don't build myself and trust to use.

My initial approach with the design of the lib was to say: here are the building blocks, now it's up to you to design and secure your protocol.

Your path resembles many other ones. Think about HTTPS: it went from non-existent to opt-in to opt-out, and many tools are moving to require it by default, like HTTPS-Only Mode in Firefox. Why? Because it's everyone duty to guarantee a secure by default design; if you want insecurity, you have to explicitly choose it.

I think we'll have with this approach is security by obfuscation - which isn't secure at all.

There is no hidden information here, everyone knows JWT are encoded JSONs easily readable by trivial tools like https://jwt.io/ As you mentioned, this proposal aims only at security by default.

To put this into perspective, I've taken the most starred JWT libraries from https://jwt.io and all them don't let you decode, hence consume, a JWT without verifying the signature at the same time:

All they just apply the RFC: invalid signature means invalid token, just like invalid base64 means invalid token. This is the only relevant library that allows to parse and use an unverified JWT usage without any warning or explicit consent.

using types to make things more explicitly and allow people to define expectations regarding things is quite clever and neat, btw.

I have to say I don't see a point in having a parsed but unverified token. The RFC clearly states that signature verification shall be a part of the parsing process. By this logic, we should have:

interface RawToken {}
interface ThreeDotValidatedToken {}
interface Base64ValidatedToken extends ThreeDotValidatedToken {}
interface Utf8EncodeValidatedToken extends Base64ValidatedToken {}
interface JsonFormatValidatedToken extends Utf8EncodeValidatedToken {}
interface SignatureValidatedToken extends JsonFormatValidatedToken {}

What's the purpose of this? RawToken and SignatureValidatedToken are the only required by RFC. The solution brought by @Ocramius seems to me a technical way to have our point of views agree and unite; I'd say better then now, still flawed as non secure by default, user should not interact with signature-invalid tokens.

I'm ok to give timestamp validation obligatoriness away if you prefer, even though it would be sad since we can both have it and adhere to RFC with our LooseValitAt.

Where headers need to be consumed before validation, the tool to read them should be different from the main flow to separate concerns, something like Parser::readUnverifiedHeaders(string $jwt): DataSet.

lcobucci commented 3 years ago

Think about HTTPS: it went from non-existent to opt-in to opt-out, and many tools are moving to require it by default, like HTTPS-Only Mode in Firefox. Why? Because it's everyone duty to guarantee a secure by default design; if you want insecurity, you have to explicitly choose it.

So let's follow that path, then?

We don't need to break BC for that, though. By providing another parser implementation that receives information on the constructor we can solve that and allow users to opt-in in v4 and make it as default in v5 if our users agree:

final class SecureParser implements Parser
{
    public function __construct(
        Parser $originalParser,
        Validator $validator,
        SignedWith $signatureVerification,
        ValidAt $timeVerification,
        Constraint ...$extraConstraints 
    ) {
       // ...
   }
}

Is this a reasonable way to move forward?

Slamdunk commented 3 years ago

Sure: in the last 24 hours I've only conceived theory validity of my proposal; we can indeed implement everything in v4, see if it's good, then deprecate the old API in v4 to drop it in v5 without rush.

For the exact details, I need more time to craft them in my mind as I'm busy right now.

lcobucci commented 3 years ago

in the last 24 hours I've only conceived theory validity of my proposal

Your arguments are good but I'm not convinced that a library should address something that I believe to be the responsibility of the users. However, I'm more than happy to be proven wrong with data and adoption instead of extend the argumentation. You have a solid hypothesis, let's confirm it!

For the exact details, I need more time to craft them in my mind as I'm busy right now.

It's alright! There's no rush 👍

Patrick-Remy commented 3 years ago

As a user who just migrated to this new API, I can confirm that I asked myself if need to add SignedWith constraint, as it's already 'known' by instantiating the config. I myself miss a full example in the docs adding the typical constraints.