Closed aeneasr closed 1 year ago
What is the rationale behind this ?
It’s according to spec
Is there some context for understanding this? A link to what is changing? A link to the spec? Who is the user? I don't allow my users to set the client id, but it's important that I be able to set it. I can't imagine an OAuth spec would rule that out.
As far as I can see, the oAuth2.0 standard doesn't make any statement about whether the client id can be manually defined or not.
https://www.rfc-editor.org/rfc/rfc6749#section-2.2
The authorization server issues the registered client a client identifier -- a unique string representing the registration information provided by the client. The client identifier is not a secret; it is exposed to the resource owner and MUST NOT be used alone for client authentication. The client identifier is unique to the authorization server.
The client identifier string size is left undefined by this specification. The client should avoid making assumptions about the identifier size. The authorization server SHOULD document the size of any identifier it issues.
On the other hand, OAuth.com's document https://www.oauth.com/oauth2-servers/client-registration/client-id-secret/ says
The client_id is a public identifier for apps. Even though it’s public, it’s best that it isn’t guessable by third parties, so many implementations use something like a 32-character hex string. If the client ID is guessable, it makes it slightly easier to craft phishing attacks against arbitrary applications. It must also be unique across all clients that the authorization server handles.
At first I was dismayed about this idea of forcing uncustomizable client ids in Hydra, but the above paragraph has kind of turned me around. From a security and human error point of view, a random client id is stronger.
And it's true that other oAuth OPs like Github use random strings in this way.
I kind of agree with both sides. Perhaps, @aeneasr , rather than force this change, you could make it optional, even if you default to this new behavior. Allowing people to set some sort of CLIENT_ID_IS_CUSTOMIZABLE boolean config flag/env var? Or simply accept the --client-id
flag/field when creating it, but generating a random one if not provided.
What is this "phishing attack" that can be carried out "against arbitrary applications"???
Phishing is a social engineering attack. It needs to be mounted against people. You can't phish an application.
Why should client_id ever be something that a human looks at and makes decisions upon so much that it can form the basis for a phishing attack?
Moreover, even if a human did make a decision based on it, how would that result in the attacker getting more information?
I was just quoting an article. I didn't write it. I agree that's not phishing, the point they are making is that both client id and secret are needed to authenticate - if both are totally random instead of one potentially more-easily-guessable, that's technically more secure. (against a brute force attack - not as a phishing attack)
Anyway, not here to get emotional or argue (it's not my words anyway :) ) turning off notifications now, but interested to hear what happens in Hydra 2.0
Appreciate your context, it helps a lot. Wasn't trying to argue with you.
I just want to make sure the questions get put on the record so we can think critically about the decisions getting made.
It's quite possible there's an attack I'm missing, and, if so, I'd love to learn about it, primarily to make sure there's nothing else going wrong with how I'm going about things than just the client_id.
I very much agree with the OAuth spec section you quoted. I think it is important that the authorization server issues the client an id, that it not be secret, and that it not be used alone for identification. I also agree with your point that the spec leaves open how the authorization server assigns that ID.
I just came across this page: https://www.ory.sh/hydra/docs/next/guides/openid-connect-dynamic-client-registration/
I want to say that I support that for dynamic client registration that it should not be possible for the developer to specify their client id during registration.
My prior comments were under the assumption that this issue was about the REST API ( https://www.ory.sh/hydra/docs/reference/api/#operation/createOAuth2Client ) for creating clients on the admin (non-public) interface.
The REST API admin interface is used by our security administrators, who need to be able to define all the details of a client.
We will deprecate setting custom client IDs on the Admin API as well. It is already disabled on OpenID Connect Dynamic Client Registration.
The problem with letting people choose IDs is that it can lead to serious scalability issues if the ID generation mechanism is not cryptographically secure ("true" random) but e.g. sequential (1,2,3,4).
The Client ID is used everywhere and is always in need to be looked up. Depending on the data set, this can have very serious performance issues. See also: https://segment.com/blog/the-million-dollar-eng-problem/
Thus, we'll just disallow this and set the client ID to a UUID. Legacy clients will still be able to use the legacy ID but new clients won't.
And sorry for being so short on my last reply, I was in a hurry. Thank you for expanding on the context. So tl;dr two reasons:
Okay, thanks for the context, and good to know legacy clients won't be affected. I will figure out a way to change our assumptions for new clients, so that they can be assigned random UUIDs.
As for the performance concerns, I definitely see the DynamoDB hot shard problem. Wrangled with that one myself quite a few times.
Just by way of warning: I think that simply disallowing client ID choice will not be enough to solve the DynamoDB hot shard problem.
Of particular note, in the article, Segment didn't solve their DynamoDB hot shard problem by disallowing users from picking their own user id. They solved their hot shard problem by (a) blocking "a daily automated test against our production API that resulted in a burst of hundreds of thousands of events attached to a single userID" and (b) "blocking any new discovered badly behaved keys."
To explain this point further, consider the actual usage pattern among our oauth clients:
It doesn't matter whether Clients 1 and 2 have an administrator-set id or a random UUID, whatever shards Clients 1 and 2 end up on will be hot shards.
In fact, it might be better if the administrator can set their ids. At least that way the administrator could ensure that Client 1 and 2 end up on different shards, instead of just hoping that they do.
The actual solution, though would probably be one of the following:
(1) If most access to client data is reads, then reads of client data could be passed through a Least-Recently-Used read cache. The top three clients would virtually always be cache hits, with the remaining clients competing for the other cache slots and accounting for nearly all of the cache misses, which would be okay, because those are spread across a large number of client ids, which would be spread out over the shards, so the hot shard issue would be avoided.
(2) If there are frequent client-specific writes (I'm not sure if hydra has that problem, depending on how tokens are stored?), then there probably needs to be an administrator tuning setting where specific heavy-use client-specific data (tokens?) can be split off into their own designated shards, with the ability to assign several shards to a single such client.
Thank you for putting this in context! It’s unlikely that external parties will be able to properly choose shard IDs in my experience. We do however plan to have shard IDs in the DB to allow choosing hot spots for admins. The primary client IDs will still not be able to be chosen by users as most clients are not sophisticated enough to understand the inner workings of the underlying database store. This feature is primarily to help us integrate Ory Hydra into Ory Cloud and run it at scale for millions pf consumers :)
Would you be able to support setting a prefix for the client ID to still allow the randomness as you desire but not being able to control the entire value, but have an ability to detect the client ID in code?
In our platform, we synchronize oauth2 clients from a config file with a script using Hydra admin API. It allows us to automate setup new dev/qlf environment quickly. We need to set a specific client_id and client_secret.
Moreover, we take decision in our code based on client_id.
Can't we have a flag to enable/disable this breaking change ? Or maybe you could use a consistent hash to derivate you technical id from client_id (something like mongodb hashed sharding strategy) ?
I understand your use case - having predictable IDs is important. We might add the option to specify the ID (only UUID v4 will be allowed) on client creation if there are no collisions. However, having arbitrary text as client IDs will no longer be possible
Ok, allow to specify the client id with an UUID format will answer to our needs to create new env.
However, to keep things simple, when we have to make decision in our code, i still think it would be better to have an human readable ID, or fields metadata on which we can based our decision.
Maybe, a great way to address this second usecase, would be to give the possibility to expose the metadata client object inside the introspect response and include it also in jwt. This will give flexibility.
client secret is the only real "secret". As long as client IDs are unique is there an issue with whether it is a UUID v4 or something else. We have the goal to be able to detect OIDC client ID and secret in source code which makes having a prefix of known pattern valuable.
client secret is the only real "secret". As long as client IDs are unique is there an issue with whether it is a UUID v4 or something else. We have the goal to be able to detect OIDC client ID and secret in source code which makes having a prefix of known pattern valuable.
I think prefixing is a good idea. In that case, it would apply to the secret though. A client ID should be considered public knowledge and there is no point in scanning for it!
You will be able to set client secrets in the future, so prefixing this is something that can be solved by you (for example by making sure that all calls to the /clients
endpoint and enforcing a prefixed secret).
Not yet mergable, reopening
done in v2
I understand your use case - having predictable IDs is important. We might add the option to specify the ID (only UUID v4 will be allowed) on client creation if there are no collisions. However, having arbitrary text as client IDs will no longer be possible
I don't understand why you ended up going for system-chosen Oauth2 Client IDs, instead of making it possible to specify clientId on client creation, but validating that it is a UUID v4. Can you expand on why Hydra v2 ended up with system-chosen Client IDs? @aeneasr
Hey, it’s ok if you don’t understand ;)
Here’s the rationale:
Client IDs will in the future be UUIDs pre-generated by Ory Hydra. This is an important change to significantly improve performance at scale and support distributed database systems. It’s a cornerstone change to
You can set custom metadata for the client in the client’s metadata field if that helps you identify where the client belongs.
Here are some resources that helped us on this journey:
Allowing Ory to choose the UUID also helps us to improve the performance of the primary key in the future. Ory Keto for example has already switched to UUID V5 to improve paginated queries as well as sorting. We might not even keep using UUID at some point if there is a better, faster, more unique, and more scalable system available.
I know that not being able to choose the primary key format can sometimes be annoying. We do have the best of intentions here and are convinced that Ory’s users are better of if we apply the best practices the industry has found to build large-scale systems. I hope this helps :)
Thank you for the detailed answer :) I certainly understand the thought process behind the change, but it is still a huge change when moving from Hydra V1 to V2, when we have been able to specify the client ID earlier.
I get the argument for supporting UUID in distributed database systems, but this might not improve performance in PostgreSQL for example: https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/.
Are you still open for adding an option to specify the client ID (only UUID v4) on client creation if there are no collisions? I think it would make the upgrade to Hydra V2 a lot easier for several people :)
I created a PR with a suggestion for a configuration flag that allows us to specify the client ID: https://github.com/ory/hydra/pull/3354.
I was wondering why you chose to use a UUIDv4, instead of a name based UUIDV5 in order to avoid this breaking change. With a UUIDv5 you could have let the user choose their clientid and compute the primary key from that clientid (PK = UUIDV5.from(hydra namespace, clientid)).
From the point of view of the hydra deployment existing for ever, this perf enhancement is logical and probably needed. But for the developer doing repeated deployments of hydra as part of a larger solution, it is a big problem.
We use ephemeral deployments for dev and pre-prod, re-deploying the entire k8s including hydra and the pgsql behind it. This means the oauth client creation is done repeatedly, at CD time, dozens of times a day if needed. We then have clients (native, web, etc...) that connect to these ephemeral deployments by selecting a suitable issuer (i.e. engineering cluster, mktg , prod , etc..) and log on using a well-known oauth client ID.
Having ephemeral client IDs throws a monkey wrench in this kind of flow. Web clients (SWA, angular, react, ...) and native clients are pre-compiled and had client ID baked-in at compile time. Rebuilding different clients for each re-deployment of say, a dev cluster is completely bonkers. Only option would now be to somehow (?) fetch the client ID out of band, requiring some previously not needed endpoint.
Either way for now I need to version lock to < 2 because it's an impacting change well beyond hydra itself.
We have previously used the client id with the policy agent to retrieve and enforce the client's permissions. With this change we would have to place our internal id's inside the client metadata, but the token introspection endpoint does not return this information, neither can ory oathkeeper's token introspection authenticator reference anything but the oath client id (no "subject from" option as for example in the cookie authenticator).
For us this would mean to inverse the information flow entirely, we cannot create any objects or data points inside our software until ory hydra has decided on what the id of the client will be, because it wants to have ultimate control over the ids. The alternative would be to resolve the client id by querying the client's metadata, effectively adding one more API request and currently increasing latency by about 30% for most requests.
I do not think that putting such strict rules on deep into hydra's code due to a problem that large scale operators have is a bad idea. Large scale operators will have similar key distribution issues in all their applications and databases and should be able to consider the performance impact of letting users choose the client id or must block it if they see performance issues.
Does Oauth2Client CRD not yet support this change? https://github.com/ory/hydra-maester/blob/master/controllers/oauth2client_controller.go#L394
I'm failed to create OAuth2Client via maester for both of (1) client id missing is not allowed from the above code, (2) hydra does not accept client_id specified from this change.
I wonder what's current status.
(Edit) I found it myself: https://github.com/ory/hydra-maester/issues/117
... But for the developer doing repeated deployments of hydra as part of a larger solution, it is a big problem.
As has been mentioned, being able to set the client id is crucial for testing. The flow for us is the same as ericb-summit mentioned. Artifacts are built and then tested.
Not knowing the client-id at compile time means we would have to have a pre configured / running instance of hydra. Having a single instance means that developers are scared to make changes, as they have no way to test them ahead of time (obviously it's possible, but too much effort for most people to get involved) and their change will directly impact all testing. So a single test instance won't work for us.
We've tried to do the same (shared testing instance) with hosted providers such as (auth0 and okta). For the reasons mentioned above, we are looking to move away from them and to ory hydry.
The best solution I can think of, is to inject the sql into the hydra DB with known client ids. Do you foresee any pitfalls in this?
I ended with the following:
I really understand hydra not allowing to set internal id's, but maybe separating the external client_id from the internal id in the storage layer would be a way which allows hydra enterprise scale and letting us choose the client id? Maybe for v3?
We are evaluating the Ory stack and this is a major problem. As others have mentioned not being able to specify the client id (or secret for that matter) at deploy time is a major hurdle for testing and ephemeral environments. It should also be possible to store client information in a secrets store (e.g. Hashicorp Vault) so that this information can be securely distributed to internal clients. If I understand this change correctly, we cannot do either without calling the hydra api to first generate the credentials.
I also disagree with the spec here that human-readable client identifiers are a security risk. Creating a client-id that is "more unguessable" is just security by obscurity. The entropy to prevent guessing should all be in the secret.
Could the performance problem not have been solved with surrogate identifiers? (e.g. uuid column, name column). I suppose that is what the metadata fields are for, but it is much more difficult to provision a new client repeatably this way.
Could the performance problem not have been solved with surrogate identifiers? (e.g. uuid column, name column). I suppose that is what the metadata fields are for, but it is much more difficult to provision a new client repeatably this way.
No, because you still need to look up the Client ID on every request. There are also other solutions to this. For example, you can add predicatbility to your tests by loading a database backup with an existing client which then has an immutable ID as it already exists!
Adding myself here... We absolutely need the ability to specify client id so that we can migrate out clients from our legacy system to Hydra without regenerating client id and secret. Is my solution really to have to downgrade to an old version of Hydra to accomplish the data migration, and then upgrade after? Why would you remove the ability to specify the client id? It's very important for data migration tasks.
How can I integrate Ory Hydra if the one can`t generating automatically user_ID? I will nod using this ID in cookie - I will use it in my client back-end side to access to the database.
If no client_id then it won't work with apache superset. Any other application I can use for superset apart from the keycloak?
oryd/hydra-maester:v0.0.28
still requires client_id
, which, when specified, fails due to the It is no longer possible to set an OAuth2 Client ID as a user. The system will generate a unique ID for you.
, thrown by oryd/hydra:v2.1.1
.
Will try to create the client via command line but that won't fly longterm.
We are walking back on this decision, reopening :)
Is there any available options now (e.g. incoming PR)? We're downgrading hydra when client_id is required, and it's absolutely painful :innocent:
Which part is being walked back? Will we be able to set any client id or is it still going to have to be UUID based?
Is there any available options now (e.g. incoming PR)? We're downgrading hydra when client_id is required, and it's absolutely painful 😇
You can manually set the legacy ID in the database if you want.
Which part is being walked back? Will we be able to set any client id or is it still going to have to be UUID based?
Most likely it will be a varchar type. We'll see how we can deal with hotspots when the issue arises in our system
Thank you so much for listening to the community - it is a really big deal. I am looking forward to finally upgrading to Hydra 2 with this change.
The 2.2.0 release notes say that this ability to set the Client ID when creating a client, has been reinstated.
Yet running create client
via the hydra v2.2.0 Docker image does not allow the --id
option and it's not listed in the available options.
How do we set the client?
Looks to me like you didn't add an --id
option back in to the CLI at https://github.com/ory/hydra/blob/v2.2.0/cmd/cmd_create_client.go#L19-L47 ? :/
@aeneasr just pinging you on the above two comments ^ is my understanding right? Will you make a release that reintroduces the ability to set --id
from the CLI tool, or is it intended that it can only be done via the API (which I haven't tried yet, but assume it's possible that way)
The 2.2.0 release notes say that this ability to set the Client ID when creating a client, has been reinstated.
Yet running
create client
via the hydra v2.2.0 Docker image does not allow the--id
option and it's not listed in the available options.How do we set the client?
I am getting the same problem.
@mig5 @guptaaashutosh It looks like it's been merged, but not released yet.
For what it's worth -- and I'm only using this in a test environment where I need well-known client IDs -- it looks like I can just update the id
field in the hydra_client
table after creating with CLI (my backend is MySQL).
Well... I should qualify that to say I tried getting a token one time and it worked...
Clearly it's an ugly workaround, but until we get proper CLI support back, it will have to do.
The 2.2.0 release notes say that this ability to set the Client ID when creating a client, has been reinstated. Yet running
create client
via the hydra v2.2.0 Docker image does not allow the--id
option and it's not listed in the available options. How do we set the client?I am getting the same problem.
@aeneasr
For what it's worth, I can confirm that it's possible to set the ID when creating the client when doing so via a script that interacts with the Admin API (e.g not via the CLI tool).
If anyone is using PHP and needs help, I can share an example.
Preflight checklist
Describe your problem
It should no longer be possible to set the Client ID.
Describe your ideal solution
The Client ID should be generated by the server.
Workarounds or alternatives
None
Version
master
Additional Context
No response