w3c-fedid / FedCM

A privacy preserving identity exchange Web API
https://w3c-fedid.github.io/FedCM/
Other
383 stars 73 forks source link

FedCM privacy concerns #595

Closed anderspitman closed 5 months ago

anderspitman commented 6 months ago

A stated primary goal of FedCM is to preserve privacy. However, as currently designed it exposes to the IdP a list of every RP the user logs in to. This information is primarily leaked by sending the client_id parameter, which is expected to be associated with an RP, to the IdP.

This information shouldn't be necessary for an identity service. A notable example is Mozilla Persona, which allowed users to have their IdP attest their identity (in that case defined as control over an email address) to any app without the IdP knowing what apps were using that information.

It would be possible for a FedCM IdP to accomplish something similar with the current defined endpoints, but would require violating the spec. It could look something like this:

  1. In the initial navigator.credentials.get call, the RP uses a random value for the client_id instead of a value that can be associated with the RP.
  2. On the client metadata endpoint, the IdP ignores the client_id and returns fake URLs for privacy policy and TOS. Note that there is a discussion about changing the way these endpoints are provided anyway. See https://github.com/fedidcg/FedCM/issues/581#issuecomment-2123068427
  3. On the ID assertion endpoint, the IdP may use the random client_id in its assertion process. For example, it might place the client_id in the aud property of an OIDC ID token. This allows the RP to verify that the token is intended for it.

Is there any chance of closing this hole, either by changing the wording of the spec to remove the requirement for the IdP to be able to identify the RP, or (even better) by changing the endpoints to not even use client_id?

obfuscoder commented 6 months ago

I'm not so sure about overreaching.

As far as I know, at least European companies are legally obliged (EU GDPR) to record with which 3rd parties personal data of their end-users have been shared. Before sharing the data, it must be ensured that the end-user gave consent. This is what the FedCM popup dialog is doing, but as the dialog is a browser feature, it needs to inform the IdP about the client the data is sent to (and whether the dialog was actually shown) so that the IdP can release the personal data to the 3rd party including writing a record of that transaction.

anderspitman commented 6 months ago

I'm not so sure about overreaching.

As far as I know, at least European companies are legally obliged (EU GDPR) to record with which 3rd parties personal data of their end-users have been shared. Before sharing the data, it must be ensured that the end-user gave consent. This is what the FedCM popup dialog is doing, but as the dialog is a browser feature, it needs to inform the IdP about the client the data is sent to (and whether the dialog was actually shown) so that the IdP can release the personal data to the 3rd party including writing a record of that transaction.

Ha. I never realized until this moment how similar "overarching" and "overreaching" were. I change the title to be more clear. Sorry for the confusion!

anderspitman commented 6 months ago

You bring up some good points though. In FedCM, technically it's the browser and not the IdP providing the information to the RP right? The IdP returns a token to the browser and the browser sends it to the RP, based on the user's consent. Assuming you have a token format that could be parsed by the browser, it could verify that only information that the user consented to is provided. Obviously this would require changes to FedCM, including:

  1. Specifying a token format
  2. Implementing parsing of said format in order to perform checks.
samuelgoto commented 6 months ago

It would be possible for a FedCM IdP to accomplish something similar with the current defined endpoints, but would require violating the spec.

This is such an interesting idea.

It could look something like this:

There are a couple of things that comes to my mind:

UPDATE: I think you are right, it is not implementable today: the browser shares the Origin in the IdP assertion endpoint, which wouldn't allow you to omit the clientId parameter.

Is there any chance of closing this hole, either by changing the wording of the spec to remove the requirement for the IdP to be able to identify the RP, or (even better) by changing the endpoints to not even use client_id?

Maybe we could make client_id optional, and skip the client_metadata_endpoint request.

anderspitman commented 6 months ago

I agree the violation is minimal on the technical side. On the policy side I could see there being issues such as those @obfuscoder raised.

@aaronpk is definitely more qualified than me to anticipate security flaws with the idea. For replay attacks, what specific scenario are you thinking of?

One vulnerability this might open up is a DDoS vector. If client IDs are random, you could hit the ID assertion endpoint over and over forcing the IdP to mint a lot of tokens. You can rate limit by IP (and I suspect this would be sufficient), but won't be able to rate limit by client ID anymore.

aaronpk commented 6 months ago

Without getting into the technical bits of the proposal, this is starting to sound a lot like the discussions in the "wallet" space which provide this same kind of privacy property.

In the wallet world, an "issuer" issues credentials that are "held" by the wallet, and then "presented" to an RP. So by definition, in this model, the IdP never knows which RPs the credentials are presented to.

The reason I bring this up is that I think any discussion of this privacy property should happen in conjunction with the wallet discussions, rather than trying to be shoehorned on top of OAuth/OpenID Connect.

samuelgoto commented 6 months ago

Without getting into the technical bits of the proposal, this is starting to sound a lot like the discussions in the "wallet" space which provide this same kind of privacy property.

That occurred to me too. @anderspitman , FWIW, there is a browser API for that too:

https://wicg.github.io/digital-credentials/

We don't know yet how the DC API relates to the FedCM API, but as we move along, we are interested in figuring that out!

npm1 commented 6 months ago

I guess there are two parts to the suggestion here:

anderspitman commented 6 months ago

One thing I had missed is that the PP/TOS links are optional (as is the entire client metadata endpoint). Just leaving it out should solve that part.

As for the other, why is it necessary to enforce CORS on the assertion endpoint? Why does the IdP care who it's asserting to? This seems like a violation of least privilege.

I appreciate that the digital credentials spec may be more in line with this type of functionality, but the reality is that FedCM is the protocol that may actually get widespread adoption in the authentication space. For example, are Google and other social login providers likely to implement authentication support on top of digital credentials?

One of the core reasons LastLogin exists is to create a privacy barrier between upstream IdPs and RPs, so that IdPs only know that the user is logging in to LastLogin, and not what RPs they're using. But this requires users to trust LastLogin not to abuse that information, instead of trusting IdPs. I'd prefer if they didn't have to trust LastLogin at all. But if the browser is going to be sending the Origin header whether I want it or not then they have to trust me.

As a potential implementer and supporter of FedCM, this is a serious drawback for me.

npm1 commented 6 months ago

As for the other, why is it necessary to enforce CORS on the assertion endpoint? Why does the IdP care who it's asserting to? This seems like a violation of least privilege.

We're sharing the contents of a cross-origin credentialed fetch with the RP, so the IdP must explicitly agree with sharing that information. This is how the web works. Also, the IdP is sharing user information with the RP, so my intuition is that most IdPs do care who they share it to. I'm curious though, how does LastLogin force itself to be blind to the RP requesting the credential nowadays?

There was previously some discussion about a cached FedCM version. Perhaps the IdP could 'store' some ID assertions, and when FedCM is invoked the RP could receive the stored value. If there is no credentialed fetch at that time then we can avoid the IDP knowing who the RP is. Would something like that work?

aaronpk commented 6 months ago

While I'm sympathetic to the privacy concerns, I really think the digital credentials API is the better place for that kind of thing.

In the consumer world of "sign in with google/facebook/etc", the IdPs absolutely want to know where the user is signing in, and they don't even support the concept of an unregistered client using the API.

In the enterprise world, the enterprise IdPs also absolutely want to know where the user is signing in, and also limits the apps to which a user can use their enterprise identity.

A similar concern is in open banking and research + education.

And here's the really funny part, if I'm bringing my own IDP as a user (see FedCM for IndieAuth), I as a user also want to know which RPs I've used my identity at.

anderspitman commented 6 months ago

@npm1:

We're sharing the contents of a cross-origin credentialed fetch with the RP, so the IdP must explicitly agree with sharing that information

IMO with FedCM it's actually a subtly different situation. See my comment above. The IdP isn't sharing information directly with the RP. It's sharing information with the browser, which has full control over what to give to the RP, and knowledge of what has been consented to be shared. So the IdP only has to trust the user agent. But that's necessary anyway because the browser could lie about the RP in the first place.

How does LastLogin force itself to be blind to the RP requesting the credential nowadays

It can't. All I can do is not store that information (all login data is stored client-side in JWTs), point to the code, and hope people trust me when I say that's what I'm running. That's one reason why I'm pushing for more privacy-oriented protocols.

There was previously some discussion about a cached FedCM version. Perhaps the IdP could 'store' some ID assertions, and when FedCM is invoked the RP could receive the stored value. If there is no credentialed fetch at that time then we can avoid the IDP knowing who the RP is. Would something like that work?

This would make it so the IdP doesn't always know when I'm using an RP, but they would still have a complete list of the RPs I use, right?

@aaronpk:

IdPs absolutely want to know where the user is signing in, and they don't even support the concept of an unregistered client using the API

If we can't allow users to opt out of this sort of tracking, can we at least make it possible for IdPs to opt out? As it currently stands, I think I just need a way to tell the browser not to send me the Origin header, and maybe a way for RPs to specifically request privacy-focused IdPs (or at least indicate that an IdP is incapable of tracking the login).

sebadob commented 6 months ago

On the ID assertion endpoint, the IdP may use the random client_id in its assertion process. For example, it might place the client_id in the aud property of an OIDC ID token. This allows the RP to verify that the token is intended for it.

And what would prevent evil.example.com from requesting a token for good.example.com, when then IdP can't very the client_id / aud against an origin? You are even increasing the attack surface with this. You need to somehow validate, that your token is actually being delivered to the correct client. This would only work, if the user would not have any control over the client_id and FedCM sets this automatically. But we already have this mechanism with the Origin header.

Edit:

And when you choose a random client_id each time, it means you would need to keep a state for each single login request for each user and you can't verify any token like you would do it now. This means with each API request, you need to look up that random state / aud in some DB and make sure, that it actually belongs to you.

anderspitman commented 6 months ago

And what would prevent evil.example.com from requesting a token for good.example.com, when then IdP can't very the client_id / aud against an origin? You are even increasing the attack surface with this. You need to somehow validate, that your token is actually being delivered to the correct client. This would only work, if the user would not have any control over the client_id and FedCM sets this automatically. But we already have this mechanism with the Origin header.

This is a valid point. You could use the client_id like a nonce (or use the built-in FedCM nonce), but you're still vulnerable to evil.example.com requesting a client_id from good.example.com and playing middle man. The best solution I have so far would be to use PKCE and do the full authorization code flow, so evil.example.com never has a chance to see the PKCE code verifier.

The problem is that now the IdP is receiving requests directly from RPs. This is still a big improvement, but an IdP could do IP correlation and likely determine the IPs of at least some RPs. So RPs would have to use VPNs/Tor/etc for those requests to preserve user privacy, which is definitely not ideal. Going to need to give this some more thought.

And when you choose a random client_id each time, it means you would need to keep a state for each single login request for each user and you can't verify any token like you would do it now. This means with each API request, you need to look up that random state / aud in some DB and make sure, that it actually belongs to you.

Note sure I understand what you mean here. Are you talking about the IdP side or the RP side? For the IdP side, I don't need access tokens or additional APIs beyond the ID token. LastLogin (and likely other privacy-focused IdPs) is essentially there to vouch that X user controlled Y ID (typically an email address) at time Z. Narrow scope increases security and reduces the need for trust. You can think of what I'm aiming for as conceptually very similar to Mozilla Persona.

If you're talking about the RP side, once the IdP has asserted identity, you're going to create your own session anyway based off that assertion.

Or am I misunderstanding you?

sebadob commented 6 months ago

The best solution I have so far would be to use PKCE and do the full authorization code flow, so evil.example.com never has a chance to see the PKCE code verifier.

PKCE is useless, when you are not validating the origin or redirect uri and it would not even be a MITM, because evil.example.com could do the whole flow from start to finish and the IdP would not even notice, because the client is not confidential and therefore can't validate a client_secret. When evil then finished the whole flow and received a token that is valid for your client only, it can potentially use it to do API requests and even your API would not notice it, because the validation would be ok.

Note sure I understand what you mean here. Are you talking about the IdP side or the RP side? For the IdP side, I don't need access tokens or additional APIs beyond the ID token.

I mean the RP. It might be the case, that you in your case only care about the id_token, but what about all the other cases? Additionally, in that case you could only use the id_token once directly when you received it and you would need to implement other machanism to verify the validity of the token / session, because you would not be able to verify the token with subsequent requests .

If you're talking about the RP side, once the IdP has asserted identity, you're going to create your own session anyway based off that assertion.

That might be the case for you, but what if the client simply wants to use an access_token? And then, why even bother creating a JWT in the first place, when you are not validating it. Then a simple JSON response without any signature would do the trick as well and be a lot faster and more efficient at the same time.
I think the FedCM should be defined in a way that it can be used in a lot of szenarios, not just the most basic ones.

anderspitman commented 6 months ago

PKCE is useless, when you are not validating the origin or redirect uri and it would not even be a MITM, because evil.example.com could do the whole flow from start to finish and the IdP would not even notice, because the client is not confidential and therefore can't validate a client_secret. When evil then finished the whole flow and received a token that is valid for your client only, it can potentially use it to do API requests and even your API would not notice it, because the validation would be ok.

This is how PKCE helps:

  1. RP initiates FedCM flow, providing random client ID and PKCE code challenge
  2. IdP returns authorization code in JavaScript, which it passes to the backend
  3. RP uses code to retrieve ID token

evil.example.com never knows the PKCE verifier, so it can't use the code to retrieve the token. Only the RP backend can do that. It's true that evil.example.com could create a token for the same client ID, but what would it do with it? The RP isn't going to trust tokens coming from the frontend, only authorization codes. And if evil.example.com gives it an evil code, the PKCE verifier isn't going to match.

It might be the case, that you in your case only care about the id_token, but what about all the other cases?

That might be the case for you, but what if the client simply wants to use an access_token?

I think the FedCM should be defined in a way that it can be used in a lot of szenarios, not just the most basic ones.

Other cases are already covered by the current default FedCM design, which sends the Origin header. I'm not asking to remove this functionality, simply for a way to opt out of this for privacy-focused IdPs that simply want to provide identity, and not a bunch of other functionality.

And then, why even bother creating a JWT in the first place, when you are not validating it. Then a simple JSON response without any signature would do the trick as well

You are correct that once you switch to the 3 legged authorization code flow and if you don't need to use the tokens more than to assert identity in that moment, there's not much point in using JWTs.

sebadob commented 5 months ago

evil.example.com never knows the PKCE verifier, so it can't use the code to retrieve the token. Only the RP backend can do that. It's true that evil.example.com could create a token for the same client ID, but what would it do with it? The RP isn't going to trust tokens coming from the frontend, only authorization codes. And if evil.example.com gives it an evil code, the PKCE verifier isn't going to match.

I know what PKCE does and how it works, that was not the point. I was saying it is useless, when you accept requests from any origin or if you simply don't care where they are coming from.

I get it that you in your case only use the id_token once directly after you received it from the IdP directly and then throw the token away. In that case it's fine. But in all others it's not, because usually you request a token which you then will further use for protected API endpoints. And in these cases, it would be the worst when any site could request a token for your API. PKCE doesn't help at all in that scenario.

When you end up on https://gooogle.com which you reached via a link from your email and log in, it could request a token for https://google.com and if you are not careful, your are screwed. If the IdP doesn't validate the origin, it would send an https://google.com token to https://gooogle.com and every part of the chain would be "happy" about it. The attacker probably the most, because he owns your account from that point on.

I think adding something like this is dangerous if used incorrectly and it would be very easy to screw that up. JWT's for instance have design flaws which actually make them kind of insecure by design. You can mess up the validation very easily if you don't know what you are doing and I think such issues should be avoided as much as possible with new designs.

anderspitman commented 5 months ago

I think I might have given the impression that I'm confident I'm right about this. Definitely not the case! But I'm really hopeful that what I'm trying to do could be possible, with a reasonable set of tradeoffs. I've already learned a lot from our back and forth, so thank you for that, and for your patience.

I don't quite understand your google attack example above. Could you run through an example interaction for the id_token-only case, that shows at what point the attacker creates a session on the target app, and how they were able to get the app to trust the ID token?

Note that for anonymous login itself I don't consider leaking the ID token itself a security problem. These tokens would carry minimal data (most likely only an ID, such as an email address or better yet a public WebID). Users who care enough about privacy to use such a system would have to understand that any app can request a login from them at any time. Most importantly, the IdP UI would say something along the lines of "An anonymous app wants to log you in. This could expose your email address to an untrusted app. Only use a public email address for this type of login."

I get that there are big tradeoffs here, enough so that I'm not certain it would find much use in practice. But my instincts tell me that the value of privacy is high enough that some would be willing to pay it.

I think adding something like this is dangerous if used incorrectly and it would be very easy to screw that up

I agree you have to careful what you add to systems, because everything will be used incorrect at some point. But I think the risk in this case is relatively small. First, there's not much incentive for most IdPs to care about this. Generally implementers look to take the easiest path. So you could make it extra worth to implement this, which should guard against accidental usage. Even having a hard-coded list of IdPs in browsers would be better than nothing.

sebadob commented 5 months ago

I don't quite understand your google attack example above. Could you run through an example interaction for the id_token-only case, that shows at what point the attacker creates a session on the target app, and how they were able to get the app to trust the ID token?

When you are doing it exactly like in your case, where you fetch the id_token directly (important) from the IdP without any party in between, you then use it once and throw it away, you're fine with not being able to validate the token. TLS guarantees that you can get the token only from the IdP itself and that no one has been tampering with it.

The above scenario will become a problem as soon as you are doing anything else, like for instance fetch the token from the UI first and then forward it to your backend, or if you have endpoints that actually use a given token, no matter what type.

Note that for anonymous login itself I don't consider leaking the ID token itself a security problem.

It may not be a security problem, but I guess its a way higher privacy issue than having your IdP know the origin where you logged into? id_tokens by design carry personal user data, and most often its not just minimal data. They may carry your full name, address, phone number, and so on.

Generally implementers look to take the easiest path.

That's exactly the problem, because the easiest path is usually not very secure. In OIDC for instance the easiest would be to use the implicit flow, which you really should never do these days.

Even having a hard-coded list of IdPs in browsers would be better than nothing.

I think this would make things a lot worse tbh, but this is another topic.

anderspitman commented 5 months ago

@sebadob I've thought about this some more and I think you're actually right that this is trivial to MITM:

  1. User tricked into visiting evil.com
  2. evil.com backend pretends to be app.com frontend (fake Origin header etc) and initiates a FedCM flow with app.com. This will cause app.com to generate a random client_id and PKCE challenge and return them to the frontend.
  3. evil.com uses the client_id to continue the FedCM flow, passing PKCE challenge as nonce
  4. Browser asks user if they want to us IdP to log in to an Anonymous App (or just shows them the random ID) - aside: this is terrible UX.
  5. User confirms and FedCM returns an authorization code.
  6. evil.com, still pretending to be the legit app.com frontend, gives app.com backend the code
  7. app.com backend uses the code and PKCE code verifier to retrieve an ID token.
  8. app.com creates a valid session and returns the session cookie to evil.com

And that's game. I think I also understand what you meant saying the PKCE code is useless. The problem isn't making sure the app.com backend can trust where it got the ID token. The problem is that it can't trust where it got the authorization code because the app.com frontend is easy to spoof. The only solution I know so far is (drumroll...) for the IdP to verify that the client it presents to the user matches the Origin header of the app that calls the ID assertion endpoint.

So up to this point I'd say I'm convinced this is a bad idea. Maybe there is some way to make this secure, but this discussion also raised some real UX issues. Asking users to log in to anonymous apps seems gross. There's also a lot of useful features you're throwing away the ability to use.

At the end of the day, I think @sebadob has the correct answer: choose an IdP you trust. FedCM is on track to enable users to have whatever IdP they want, so this is possible. Taken to its logical conclusion, users can self-host their own IdP, which should provide everything I need without sacrificing features.

Thanks everyone for your time on this and especially @sebadob.