No longer allow users to set the client ID

aeneasr commented 2 years ago

Preflight checklist

[X] I could not find a solution in the existing issues, docs, nor discussions.
[X] I agree to follow this project's Code of Conduct.
[X] I have read and am following this repository's Contribution Guidelines.
[ ] This issue affects my Ory Cloud project.
[ ] I have joined the Ory Community Slack.
[ ] I am signed up to the Ory Security Patch Newsletter.

Describe your problem

It should no longer be possible to set the Client ID.

Describe your ideal solution

The Client ID should be generated by the server.

Workarounds or alternatives

None

Version

master

Additional Context

No response

eric-poitras commented 2 years ago

What is the rationale behind this ?

aeneasr commented 2 years ago

It’s according to spec

tsoestini commented 2 years ago

Is there some context for understanding this? A link to what is changing? A link to the spec? Who is the user? I don't allow my users to set the client id, but it's important that I be able to set it. I can't imagine an OAuth spec would rule that out.

mig5 commented 2 years ago

As far as I can see, the oAuth2.0 standard doesn't make any statement about whether the client id can be manually defined or not.

https://www.rfc-editor.org/rfc/rfc6749#section-2.2

The authorization server issues the registered client a client identifier -- a unique string representing the registration information provided by the client. The client identifier is not a secret; it is exposed to the resource owner and MUST NOT be used alone for client authentication. The client identifier is unique to the authorization server.

The client identifier string size is left undefined by this specification. The client should avoid making assumptions about the identifier size. The authorization server SHOULD document the size of any identifier it issues.

On the other hand, OAuth.com's document https://www.oauth.com/oauth2-servers/client-registration/client-id-secret/ says

The client_id is a public identifier for apps. Even though it’s public, it’s best that it isn’t guessable by third parties, so many implementations use something like a 32-character hex string. If the client ID is guessable, it makes it slightly easier to craft phishing attacks against arbitrary applications. It must also be unique across all clients that the authorization server handles.

At first I was dismayed about this idea of forcing uncustomizable client ids in Hydra, but the above paragraph has kind of turned me around. From a security and human error point of view, a random client id is stronger.

And it's true that other oAuth OPs like Github use random strings in this way.

I kind of agree with both sides. Perhaps, @aeneasr , rather than force this change, you could make it optional, even if you default to this new behavior. Allowing people to set some sort of CLIENT_ID_IS_CUSTOMIZABLE boolean config flag/env var? Or simply accept the --client-id flag/field when creating it, but generating a random one if not provided.

tsoestini commented 2 years ago

What is this "phishing attack" that can be carried out "against arbitrary applications"???

Phishing is a social engineering attack. It needs to be mounted against people. You can't phish an application.

Why should client_id ever be something that a human looks at and makes decisions upon so much that it can form the basis for a phishing attack?

Moreover, even if a human did make a decision based on it, how would that result in the attacker getting more information?

mig5 commented 2 years ago

I was just quoting an article. I didn't write it. I agree that's not phishing, the point they are making is that both client id and secret are needed to authenticate - if both are totally random instead of one potentially more-easily-guessable, that's technically more secure. (against a brute force attack - not as a phishing attack)

Anyway, not here to get emotional or argue (it's not my words anyway :) ) turning off notifications now, but interested to hear what happens in Hydra 2.0

tsoestini commented 2 years ago

Appreciate your context, it helps a lot. Wasn't trying to argue with you.

I just want to make sure the questions get put on the record so we can think critically about the decisions getting made.

It's quite possible there's an attack I'm missing, and, if so, I'd love to learn about it, primarily to make sure there's nothing else going wrong with how I'm going about things than just the client_id.

I very much agree with the OAuth spec section you quoted. I think it is important that the authorization server issues the client an id, that it not be secret, and that it not be used alone for identification. I also agree with your point that the spec leaves open how the authorization server assigns that ID.

tsoestini commented 2 years ago

I just came across this page: https://www.ory.sh/hydra/docs/next/guides/openid-connect-dynamic-client-registration/

I want to say that I support that for dynamic client registration that it should not be possible for the developer to specify their client id during registration.

My prior comments were under the assumption that this issue was about the REST API ( https://www.ory.sh/hydra/docs/reference/api/#operation/createOAuth2Client ) for creating clients on the admin (non-public) interface.

The REST API admin interface is used by our security administrators, who need to be able to define all the details of a client.

aeneasr commented 2 years ago

We will deprecate setting custom client IDs on the Admin API as well. It is already disabled on OpenID Connect Dynamic Client Registration.

The problem with letting people choose IDs is that it can lead to serious scalability issues if the ID generation mechanism is not cryptographically secure ("true" random) but e.g. sequential (1,2,3,4).

The Client ID is used everywhere and is always in need to be looked up. Depending on the data set, this can have very serious performance issues. See also: https://segment.com/blog/the-million-dollar-eng-problem/

Thus, we'll just disallow this and set the client ID to a UUID. Legacy clients will still be able to use the legacy ID but new clients won't.

aeneasr commented 2 years ago

And sorry for being so short on my last reply, I was in a hurry. Thank you for expanding on the context. So tl;dr two reasons:

Performance with large number of clients
The specification disallows this (for good reasons) on public registration

tsoestini commented 2 years ago

Okay, thanks for the context, and good to know legacy clients won't be affected. I will figure out a way to change our assumptions for new clients, so that they can be assigned random UUIDs.

As for the performance concerns, I definitely see the DynamoDB hot shard problem. Wrangled with that one myself quite a few times.

Just by way of warning: I think that simply disallowing client ID choice will not be enough to solve the DynamoDB hot shard problem.

Of particular note, in the article, Segment didn't solve their DynamoDB hot shard problem by disallowing users from picking their own user id. They solved their hot shard problem by (a) blocking "a daily automated test against our production API that resulted in a burst of hundreds of thousands of events attached to a single userID" and (b) "blocking any new discovered badly behaved keys."

To explain this point further, consider the actual usage pattern among our oauth clients:

Client 1: 55% of all requests
Client 2: 43% of all requests
Client 3: 1% of all requests
All other clients combined: the remaining 1% of all requests

It doesn't matter whether Clients 1 and 2 have an administrator-set id or a random UUID, whatever shards Clients 1 and 2 end up on will be hot shards.

In fact, it might be better if the administrator can set their ids. At least that way the administrator could ensure that Client 1 and 2 end up on different shards, instead of just hoping that they do.

The actual solution, though would probably be one of the following:

(1) If most access to client data is reads, then reads of client data could be passed through a Least-Recently-Used read cache. The top three clients would virtually always be cache hits, with the remaining clients competing for the other cache slots and accounting for nearly all of the cache misses, which would be okay, because those are spread across a large number of client ids, which would be spread out over the shards, so the hot shard issue would be avoided.

(2) If there are frequent client-specific writes (I'm not sure if hydra has that problem, depending on how tokens are stored?), then there probably needs to be an administrator tuning setting where specific heavy-use client-specific data (tokens?) can be split off into their own designated shards, with the ability to assign several shards to a single such client.

aeneasr commented 2 years ago

Thank you for putting this in context! It’s unlikely that external parties will be able to properly choose shard IDs in my experience. We do however plan to have shard IDs in the DB to allow choosing hot spots for admins. The primary client IDs will still not be able to be chosen by users as most clients are not sophisticated enough to understand the inner workings of the underlying database store. This feature is primarily to help us integrate Ory Hydra into Ory Cloud and run it at scale for millions pf consumers :)

willbengtson commented 2 years ago

Would you be able to support setting a prefix for the client ID to still allow the randomness as you desire but not being able to control the entire value, but have an ability to detect the client ID in code?

arnolf commented 2 years ago

In our platform, we synchronize oauth2 clients from a config file with a script using Hydra admin API. It allows us to automate setup new dev/qlf environment quickly. We need to set a specific client_id and client_secret.

Moreover, we take decision in our code based on client_id.

Can't we have a flag to enable/disable this breaking change ? Or maybe you could use a consistent hash to derivate you technical id from client_id (something like mongodb hashed sharding strategy) ?

aeneasr commented 2 years ago

I understand your use case - having predictable IDs is important. We might add the option to specify the ID (only UUID v4 will be allowed) on client creation if there are no collisions. However, having arbitrary text as client IDs will no longer be possible

arnolf commented 2 years ago

Ok, allow to specify the client id with an UUID format will answer to our needs to create new env.

However, to keep things simple, when we have to make decision in our code, i still think it would be better to have an human readable ID, or fields metadata on which we can based our decision.

Maybe, a great way to address this second usecase, would be to give the possibility to expose the metadata client object inside the introspect response and include it also in jwt. This will give flexibility.

willbengtson commented 2 years ago

client secret is the only real "secret". As long as client IDs are unique is there an issue with whether it is a UUID v4 or something else. We have the goal to be able to detect OIDC client ID and secret in source code which makes having a prefix of known pattern valuable.

aeneasr commented 2 years ago

client secret is the only real "secret". As long as client IDs are unique is there an issue with whether it is a UUID v4 or something else. We have the goal to be able to detect OIDC client ID and secret in source code which makes having a prefix of known pattern valuable.

I think prefixing is a good idea. In that case, it would apply to the secret though. A client ID should be considered public knowledge and there is no point in scanning for it!

You will be able to set client secrets in the future, so prefixing this is something that can be solved by you (for example by making sure that all calls to the /clients endpoint and enforcing a prefixed secret).

aeneasr commented 2 years ago

https://github.com/ory/hydra/commit/a3a9e641fbd9aad5bb87841471d364f08d62f9ae

aeneasr commented 2 years ago

Not yet mergable, reopening

aeneasr commented 2 years ago

done in v2

tinolazreg commented 1 year ago

I understand your use case - having predictable IDs is important. We might add the option to specify the ID (only UUID v4 will be allowed) on client creation if there are no collisions. However, having arbitrary text as client IDs will no longer be possible

I don't understand why you ended up going for system-chosen Oauth2 Client IDs, instead of making it possible to specify clientId on client creation, but validating that it is a UUID v4. Can you expand on why Hydra v2 ended up with system-chosen Client IDs? @aeneasr

aeneasr commented 1 year ago

Hey, it’s ok if you don’t understand ;)

Here’s the rationale:

Client IDs will in the future be UUIDs pre-generated by Ory Hydra. This is an important change to significantly improve performance at scale and support distributed database systems. It’s a cornerstone change to

prevent hotspot issues (-> easier sharding and load distribution);
mitigate excessive index size (-> less RAM and CPU);

You can set custom metadata for the client in the client’s metadata field if that helps you identify where the client belongs.

Here are some resources that helped us on this journey:

Allowing Ory to choose the UUID also helps us to improve the performance of the primary key in the future. Ory Keto for example has already switched to UUID V5 to improve paginated queries as well as sorting. We might not even keep using UUID at some point if there is a better, faster, more unique, and more scalable system available.

I know that not being able to choose the primary key format can sometimes be annoying. We do have the best of intentions here and are convinced that Ory’s users are better of if we apply the best practices the industry has found to build large-scale systems. I hope this helps :)

tinolazreg commented 1 year ago

Thank you for the detailed answer :) I certainly understand the thought process behind the change, but it is still a huge change when moving from Hydra V1 to V2, when we have been able to specify the client ID earlier.

I get the argument for supporting UUID in distributed database systems, but this might not improve performance in PostgreSQL for example: https://www.cybertec-postgresql.com/en/uuid-serial-or-identity-columns-for-postgresql-auto-generated-primary-keys/.

Are you still open for adding an option to specify the client ID (only UUID v4) on client creation if there are no collisions? I think it would make the upgrade to Hydra V2 a lot easier for several people :)

tinolazreg commented 1 year ago

I created a PR with a suggestion for a configuration flag that allows us to specify the client ID: https://github.com/ory/hydra/pull/3354.

arnolf commented 1 year ago

I was wondering why you chose to use a UUIDv4, instead of a name based UUIDV5 in order to avoid this breaking change. With a UUIDv5 you could have let the user choose their clientid and compute the primary key from that clientid (PK = UUIDV5.from(hydra namespace, clientid)).

ericb-summit commented 1 year ago

From the point of view of the hydra deployment existing for ever, this perf enhancement is logical and probably needed. But for the developer doing repeated deployments of hydra as part of a larger solution, it is a big problem.

We use ephemeral deployments for dev and pre-prod, re-deploying the entire k8s including hydra and the pgsql behind it. This means the oauth client creation is done repeatedly, at CD time, dozens of times a day if needed. We then have clients (native, web, etc...) that connect to these ephemeral deployments by selecting a suitable issuer (i.e. engineering cluster, mktg , prod , etc..) and log on using a well-known oauth client ID.

Having ephemeral client IDs throws a monkey wrench in this kind of flow. Web clients (SWA, angular, react, ...) and native clients are pre-compiled and had client ID baked-in at compile time. Rebuilding different clients for each re-deployment of say, a dev cluster is completely bonkers. Only option would now be to somehow (?) fetch the client ID out of band, requiring some previously not needed endpoint.

Either way for now I need to version lock to < 2 because it's an impacting change well beyond hydra itself.

rettier commented 1 year ago

We have previously used the client id with the policy agent to retrieve and enforce the client's permissions. With this change we would have to place our internal id's inside the client metadata, but the token introspection endpoint does not return this information, neither can ory oathkeeper's token introspection authenticator reference anything but the oath client id (no "subject from" option as for example in the cookie authenticator).

For us this would mean to inverse the information flow entirely, we cannot create any objects or data points inside our software until ory hydra has decided on what the id of the client will be, because it wants to have ultimate control over the ids. The alternative would be to resolve the client id by querying the client's metadata, effectively adding one more API request and currently increasing latency by about 30% for most requests.

I do not think that putting such strict rules on deep into hydra's code due to a problem that large scale operators have is a bad idea. Large scale operators will have similar key distribution issues in all their applications and databases and should be able to consider the performance impact of letting users choose the client id or must block it if they see performance issues.

ileixe commented 1 year ago

Does Oauth2Client CRD not yet support this change? https://github.com/ory/hydra-maester/blob/master/controllers/oauth2client_controller.go#L394

I'm failed to create OAuth2Client via maester for both of (1) client id missing is not allowed from the above code, (2) hydra does not accept client_id specified from this change.

I wonder what's current status.

(Edit) I found it myself: https://github.com/ory/hydra-maester/issues/117

A-Helberg commented 1 year ago

... But for the developer doing repeated deployments of hydra as part of a larger solution, it is a big problem.

As has been mentioned, being able to set the client id is crucial for testing. The flow for us is the same as ericb-summit mentioned. Artifacts are built and then tested.

Not knowing the client-id at compile time means we would have to have a pre configured / running instance of hydra. Having a single instance means that developers are scared to make changes, as they have no way to test them ahead of time (obviously it's possible, but too much effort for most people to get involved) and their change will directly impact all testing. So a single test instance won't work for us.

We've tried to do the same (shared testing instance) with hosted providers such as (auth0 and okta). For the reasons mentioned above, we are looking to move away from them and to ory hydry.

The best solution I can think of, is to inject the sql into the hydra DB with known client ids. Do you foresee any pitfalls in this?

pflipp commented 1 year ago

I ended with the following:

In our login/consent provider i create a new client in the before script of each test which interacts with hydra.
In client applications i use oidc providers which are designed for testing (it also saves me bootstraping the login provider for testing).
For test environments (monkey / QA ..) i use the injection of the sql (create the client with the api as in v1 and exec an update to switch the returned client id to the wellknown id). I really wouln't recomment this, but as this part of the infrastructure isn't critical it will not be a big thing if it breaks some day.

I really understand hydra not allowing to set internal id's, but maybe separating the external client_id from the internal id in the storage layer would be a way which allows hydra enterprise scale and letting us choose the client id? Maybe for v3?

springroll12 commented 1 year ago

We are evaluating the Ory stack and this is a major problem. As others have mentioned not being able to specify the client id (or secret for that matter) at deploy time is a major hurdle for testing and ephemeral environments. It should also be possible to store client information in a secrets store (e.g. Hashicorp Vault) so that this information can be securely distributed to internal clients. If I understand this change correctly, we cannot do either without calling the hydra api to first generate the credentials.

I also disagree with the spec here that human-readable client identifiers are a security risk. Creating a client-id that is "more unguessable" is just security by obscurity. The entropy to prevent guessing should all be in the secret.

Could the performance problem not have been solved with surrogate identifiers? (e.g. uuid column, name column). I suppose that is what the metadata fields are for, but it is much more difficult to provision a new client repeatably this way.

aeneasr commented 1 year ago

Could the performance problem not have been solved with surrogate identifiers? (e.g. uuid column, name column). I suppose that is what the metadata fields are for, but it is much more difficult to provision a new client repeatably this way.

No, because you still need to look up the Client ID on every request. There are also other solutions to this. For example, you can add predicatbility to your tests by loading a database backup with an existing client which then has an immutable ID as it already exists!

WilliamsDL commented 1 year ago

Adding myself here... We absolutely need the ability to specify client id so that we can migrate out clients from our legacy system to Hydra without regenerating client id and secret. Is my solution really to have to downgrade to an old version of Hydra to accomplish the data migration, and then upgrade after? Why would you remove the ability to specify the client id? It's very important for data migration tasks.

aaw20904 commented 1 year ago

How can I integrate Ory Hydra if the one can`t generating automatically user_ID? I will nod using this ID in cookie - I will use it in my client back-end side to access to the database.

dexter4life commented 1 year ago

If no client_id then it won't work with apache superset. Any other application I can use for superset apart from the keycloak?

miran248 commented 1 year ago

oryd/hydra-maester:v0.0.28 still requires client_id, which, when specified, fails due to the It is no longer possible to set an OAuth2 Client ID as a user. The system will generate a unique ID for you., thrown by oryd/hydra:v2.1.1. Will try to create the client via command line but that won't fly longterm.

aeneasr commented 1 year ago

We are walking back on this decision, reopening :)

ileixe commented 1 year ago

Is there any available options now (e.g. incoming PR)? We're downgrading hydra when client_id is required, and it's absolutely painful :innocent:

gdsmith commented 1 year ago

Which part is being walked back? Will we be able to set any client id or is it still going to have to be UUID based?

aeneasr commented 1 year ago

Is there any available options now (e.g. incoming PR)? We're downgrading hydra when client_id is required, and it's absolutely painful 😇

You can manually set the legacy ID in the database if you want.

Which part is being walked back? Will we be able to set any client id or is it still going to have to be UUID based?

Most likely it will be a varchar type. We'll see how we can deal with hotspots when the issue arises in our system

mig5 commented 1 year ago

Thank you so much for listening to the community - it is a really big deal. I am looking forward to finally upgrading to Hydra 2 with this change.

mig5 commented 8 months ago

The 2.2.0 release notes say that this ability to set the Client ID when creating a client, has been reinstated.

Yet running create client via the hydra v2.2.0 Docker image does not allow the --id option and it's not listed in the available options.

How do we set the client?

mig5 commented 8 months ago

Looks to me like you didn't add an --id option back in to the CLI at https://github.com/ory/hydra/blob/v2.2.0/cmd/cmd_create_client.go#L19-L47 ? :/

mig5 commented 7 months ago

@aeneasr just pinging you on the above two comments ^ is my understanding right? Will you make a release that reintroduces the ability to set --id from the CLI tool, or is it intended that it can only be done via the API (which I haven't tried yet, but assume it's possible that way)

guptaaashutosh commented 4 months ago

The 2.2.0 release notes say that this ability to set the Client ID when creating a client, has been reinstated.

Yet running create client via the hydra v2.2.0 Docker image does not allow the --id option and it's not listed in the available options.

How do we set the client?

I am getting the same problem.

hughpv commented 4 months ago

@mig5 @guptaaashutosh It looks like it's been merged, but not released yet.

https://github.com/ory/hydra/pull/3725

hughpv commented 4 months ago

For what it's worth -- and I'm only using this in a test environment where I need well-known client IDs -- it looks like I can just update the id field in the hydra_client table after creating with CLI (my backend is MySQL).

Well... I should qualify that to say I tried getting a token one time and it worked...

Clearly it's an ugly workaround, but until we get proper CLI support back, it will have to do.

guptaaashutosh commented 4 months ago

The 2.2.0 release notes say that this ability to set the Client ID when creating a client, has been reinstated. Yet running create client via the hydra v2.2.0 Docker image does not allow the --id option and it's not listed in the available options. How do we set the client?

I am getting the same problem.

@aeneasr

mig5 commented 4 months ago

For what it's worth, I can confirm that it's possible to set the ID when creating the client when doing so via a script that interacts with the Admin API (e.g not via the CLI tool).

If anyone is using PHP and needs help, I can share an example.

ory / hydra