Big problem with new multi-provider interface

I just noticed that the new support for multiple providers merged in #58 has a big problem for the way I use the plugin. I use different vault authorization policies for each provider, and that has to be done only on a path basis; it cannot distinguish between different providers based on a parameter. If it were redone so that the paths included the provider name, that would fix the problem. For example instead of

vault auth write oauth2/creds/my-user-auth server=github-puppetlabs

it would have to be something like

vault auth write oauth2/creds/github-puppetlabs/my-user-auth

Currently I can do a per-provider policy because each provider has its own top level name, so it becomes for example oauth2-github-puppetlabs.

Hmm, unfortunately I think this breaks the common use case for us: we have an ID we need to look up but we don't necessarily know which provider it's associated with in advance. However, you have a couple of options here, I think:

Perhaps the easiest one is just to keep using the plugin the way you are now? I.e., even though it supports multiple servers per engine, you don't have to use it that way. You could just create a single server with a well-known name per engine. The upgrade path uses the name legacy.
Vault's policies actually allow you to specify on a per-policy basis what parameters are allowed to be set to. See the section of the docs on parameter constraints. This won't help if you need to determine the server name dynamically, but if you have different policies that get applied depending on which auth method is used, it could work. For example:
```
path "oauth2/creds/my-user-auth" {
 capabilities = ["create", "update"]
 allowed_parameters = {
   "server" = ["github-puppetlabs"]
   "*"      = []
 }
}
```

Do either of those options help you at all?

Yes I think both of them could work, the second one being preferable because of the reduced number of processes needed. I won't have a chance to try it out immediately but hopefully in a week or two I will.

Ok Noah I'm starting to work on this and running into something I haven't been able to figure out. Am I correct in understanding that the creds read and both the config and creds delete do not require specifying a server? I don't see that in the README nor in the code. If that's correct, then how does it know which server to use? I have code that cleans up vault configuration when a server is removed, and currently that's done by vault secrets disable of the related module, but I don't know how to do the corresponding thing with this new code for deleting a no-longer used server.

Oh and now I realize another thing that seems to block me from using the new feature, in that there is still only one issuer url per plugin instance. I thought I would be able to specify a different issuer URL for each "server".

If I go back to your first option of just using the plugin the way I did, I don't understand why you refer to the upgrade path using "legacy". Would I have to do something with that? I am running the new version now after having made corresponding changes in my pr #41, and it seems to be working the same way as it used to without having to change anything.

Am I correct in understanding that the creds read and both the config and creds delete do not require specifying a server? I don't see that in the README nor in the code. If that's correct, then how does it know which server to use? I have code that cleans up vault configuration when a server is removed, and currently that's done by vault secrets disable of the related module, but I don't know how to do the corresponding thing with this new code for deleting a no-longer used server.

That's correct -- the server is stored internally with the rest of the authorization data when the credential is created.

Vault's logical backends actually don't support recursive deletion, so the unmounting-to-delete-everything is kind of a special case that's harder to support than it looks on the tin. In fact, I think right now if you delete a server that's backing a bunch of credentials, those credentials will continue to exist and never be reaped (because they expect their server to be put back into place eventually). Are you churning through backends a lot? It's not really a use case I'd put much thought into -- most of ours (thus far) are fairly constant.

At a minimum, I'll add a test case for this, but I can also add maybe some reaper options to help out further (e.g., auto-delete any creds that correspond to a server that no longer exists after X seconds have elapsed).

Oh and now I realize another thing that seems to block me from using the new feature, in that there is still only one issuer url per plugin instance. I thought I would be able to specify a different issuer URL for each "server".

You should be able to do this. E.g., vault write oauth/servers/foo [...] provider_options=issuer_url=https://foo. Is that not working for you? This might just be a docs issue... where does this appear to be the case?

If I go back to your first option of just using the plugin the way I did, I don't understand why you refer to the upgrade path using "legacy". Would I have to do something with that? I am running the new version now after having made corresponding changes in my pr #41, and it seems to be working the same way as it used to without having to change anything.

There's a migration script in the plugin that will take old plugins (v2.x) and automatically upgrade them to use the new server mechanism in v3. When it takes the plugin-level configuration and moves it to a server, it calls it "legacy". You don't have to use that name if you're not migrating.

Am I correct in understanding that the creds read and both the config and creds delete do not require specifying a server? I don't see that in the README nor in the code. If that's correct, then how does it know which server to use? I have code that cleans up vault configuration when a server is removed, and currently that's done by vault secrets disable of the related module, but I don't know how to do the corresponding thing with this new code for deleting a no-longer used server.

That's correct -- the server is stored internally with the rest of the authorization data when the credential is created.

Ok, but I thought there could be multiple credentials stored at the same path for different servers. Is that not the case? Each server's credentials have to be stored in unique paths?

Vault's logical backends actually don't support recursive deletion, so the unmounting-to-delete-everything is kind of a special case that's harder to support than it looks on the tin. In fact, I think right now if you delete a server that's backing a bunch of credentials, those credentials will continue to exist and never be reaped (because they expect their server to be put back into place eventually). Are you churning through backends a lot? It's not really a use case I'd put much thought into -- most of ours (thus far) are fairly constant.

No, it's a rare circumstance, but it could happen sometimes and I think it should be supported.

At a minimum, I'll add a test case for this, but I can also add maybe some reaper options to help out further (e.g., auto-delete any creds that correspond to a server that no longer exists after X seconds have elapsed).

That sounds like a good approach. Delete the server configuration, and at some point its corresponding credentials will also go away. Meanwhile if somebody tries to use the credentials, presumably it will fail?

Oh and now I realize another thing that seems to block me from using the new feature, in that there is still only one issuer url per plugin instance. I thought I would be able to specify a different issuer URL for each "server".

You should be able to do this. E.g., vault write oauth/servers/foo [...] provider_options=issuer_url=https://foo. Is that not working for you? This might just be a docs issue... where does this appear to be the case?

Ah, now I get it. I think I was just looking for a "server" parameter, but you're doing the equivalent by using servers in the config path and having its parameter value be the next component in the path. Ok, looks good, I'll try it.

If I go back to your first option of just using the plugin the way I did, I don't understand why you refer to the upgrade path using "legacy". Would I have to do something with that? I am running the new version now after having made corresponding changes in my pr #41, and it seems to be working the same way as it used to without having to change anything.

There's a migration script in the plugin that will take old plugins (v2.x) and automatically upgrade them to use the new server mechanism in v3. When it takes the plugin-level configuration and moves it to a server, it calls it "legacy". You don't have to use that name if you're not migrating.

Oh, a server named legacy, not a path. That makes sense. But then I don't understand why it is still working for me. I had a pre-existing configuration, I upgraded to v3.0.0-beta.3, and I didn't have to change any configuration or client code. If I understand correctly, I ought to have to at least change my client to pass in server=legacy when I create new credentials. Oh, there may have already been credentials and I was just overwriting them .. are you allowed to write a new refresh_token to creds without server=legacy if there was one already there? Yes I found one I hadn't written before and got the error "Bad Request: missing server".

Ouch, so this brings up a migration problem. It seems to me that this requires the client and server to have to be migrated at the same time, in lock-step. I was hoping to be able to upgrade the server first, then allow some period of time before requiring all clients to have to update. What do you think about having a server-side option where you can specify a default server? Using old credential paths then I would specify a default server=legacy, and only set up new credential paths to require a server name. That way old clients could continue to work for a period of time, with a server that supports both styles.

Vault's logical backends actually don't support recursive deletion, so the unmounting-to-delete-everything is kind of a special case that's harder to support than it looks on the tin. In fact, I think right now if you delete a server that's backing a bunch of credentials, those credentials will continue to exist and never be reaped (because they expect their server to be put back into place eventually). Are you churning through backends a lot? It's not really a use case I'd put much thought into -- most of ours (thus far) are fairly constant.

No, it's a rare circumstance, but it could happen sometimes and I think it should be supported.

At a minimum, I'll add a test case for this, but I can also add maybe some reaper options to help out further (e.g., auto-delete any creds that correspond to a server that no longer exists after X seconds have elapsed).

That sounds like a good approach. Delete the server configuration, and at some point its corresponding credentials will also go away. Meanwhile if somebody tries to use the credentials, presumably it will fail?

Oh I just found I had another use case for this where delayed cleanup isn't sufficient. My configuration code also disabled the whole module if the provider url or client id changes. That's important because all the refresh tokens that are stored will not be valid with a different url or client id. Again, this is something that doesn't happen often, but it can happen.

Ok, but I thought there could be multiple credentials stored at the same path for different servers. Is that not the case? Each server's credentials have to be stored in unique paths?

I'm not sure I understand this question. Can you clarify maybe with a code/CLI example of what you're expecting here? Servers have a one-to-many (not a many-to-many) relationship to credentials, if that's what you're asking. I.e., each server configuration (under /servers) has many credentials that "belong" to it, but each credential (under creds/ or self/) only has one server, and that server is determined when you write the credential to storage.

Vault's logical backends actually don't support recursive deletion, so the unmounting-to-delete-everything is kind of a special case that's harder to support than it looks on the tin. In fact, I think right now if you delete a server that's backing a bunch of credentials, those credentials will continue to exist and never be reaped (because they expect their server to be put back into place eventually). Are you churning through backends a lot? It's not really a use case I'd put much thought into -- most of ours (thus far) are fairly constant.

No, it's a rare circumstance, but it could happen sometimes and I think it should be supported.

At a minimum, I'll add a test case for this, but I can also add maybe some reaper options to help out further (e.g., auto-delete any creds that correspond to a server that no longer exists after X seconds have elapsed).

That sounds like a good approach. Delete the server configuration, and at some point its corresponding credentials will also go away. Meanwhile if somebody tries to use the credentials, presumably it will fail?

If the authorization code hasn't expired, the credential would continue to be available for read operations until the next time it needs to be refreshed. Then at expiry + some delta, it would be automatically deleted. (That's how the reaper works for all cases right now.)

Oh, a server named legacy, not a path. That makes sense. But then I don't understand why it is still working for me. I had a pre-existing configuration, I upgraded to v3.0.0-beta.3, and I didn't have to change any configuration or client code. If I understand correctly, I ought to have to at least change my client to pass in server=legacy when I create new credentials. Oh, there may have already been credentials and I was just overwriting them .. are you allowed to write a new refresh_token to creds without server=legacy if there was one already there? Yes I found one I hadn't written before and got the error "Bad Request: missing server".

I don't think it should inherit the previous configuration when you rewrite it (you should always have to specify server), but if you're seeing otherwise, it could be a bug. I'll do a bit of poking around to see if I can find any edge cases. If you can provide a snippet of code/CLI commands to reproduce, that would be helpful.

Ouch, so this brings up a migration problem. It seems to me that this requires the client and server to have to be migrated at the same time, in lock-step. I was hoping to be able to upgrade the server first, then allow some period of time before requiring all clients to have to update. What do you think about having a server-side option where you can specify a default server? Using old credential paths then I would specify a default server=legacy, and only set up new credential paths to require a server name. That way old clients could continue to work for a period of time, with a server that supports both styles.

You can actually upgrade the clients first because options that aren't recognized are silently ignored. So if your clients send server=legacy when the plugin doesn't support that option, it won't cause a problem. Then when you upgrade it will work seamlessly. (We did this exact migration path for our API/tooling.)

Oh I just found I had another use case for this where delayed cleanup isn't sufficient. My configuration code also disabled the whole module if the provider url or client id changes. That's important because all the refresh tokens that are stored will not be valid with a different url or client id. Again, this is something that doesn't happen often, but it can happen.

Interesting. My use case for keeping it would be to rotate client secrets, in which case rewriting the server configuration should have no impact on existing credentials.

The invalidation problem actually goes beyond this, too, and I'm afraid it might be out of the general scope of this plugin... for example an authorization code can seemingly be valid from the plugin's perspective but revoked at the provider (by disconnecting the integration in their website or whatever). It would then be up to the client to understand this and try to either request a new auth code or stop using that credential altogether.

Would it be possible for you to scope your credentials by some well-known prefix (e.g. hash of provider URL/client ID) in the plugin so you could have your clients "know" when they need to rotate/write new credentials? I'm afraid there's no atomic way to do this cleanup so you'd always at least have a few instants where stale credentials were still valid if client ID/provider URL changes.

Ok, but I thought there could be multiple credentials stored at the same path for different servers. Is that not the case? Each server's credentials have to be stored in unique paths?

I'm not sure I understand this question. Can you clarify maybe with a code/CLI example of what you're expecting here? Servers have a one-to-many (not a many-to-many) relationship to credentials, if that's what you're asking. I.e., each server configuration (under /servers) has many credentials that "belong" to it, but each credential (under creds/ or self/) only has one server, and that server is determined when you write the credential to storage.

I thought that I would be able to have for example a path of creds/drdaved that would return different values for server server-a and server-b. I thought that you would be able to read & write the same path with different server= parameters and deal with different refresh tokens. If not, I will plan to include the server name in the path so I'd have paths creds/server-a/drdaved and creds/server-b/drdaved and additionally pass server=server-a or server=server=b when writing. I wouldn't need to pass them when reading. That's all cool with me, it's just not how I thought it was supposed to work when I was first reading the documentation.

Oh, a server named legacy, not a path. That makes sense. But then I don't understand why it is still working for me. I had a pre-existing configuration, I upgraded to v3.0.0-beta.3, and I didn't have to change any configuration or client code. If I understand correctly, I ought to have to at least change my client to pass in server=legacy when I create new credentials. Oh, there may have already been credentials and I was just overwriting them .. are you allowed to write a new refresh_token to creds without server=legacy if there was one already there? Yes I found one I hadn't written before and got the error "Bad Request: missing server".

I don't think it should inherit the previous configuration when you rewrite it (you should always have to specify server), but if you're seeing otherwise, it could be a bug. I'll do a bit of poking around to see if I can find any edge cases. If you can provide a snippet of code/CLI commands to reproduce, that would be helpful.

Later I tried to reproduce this and couldn't anymore. So now I don't know why I didn't notice any failures at first.

Ouch, so this brings up a migration problem. It seems to me that this requires the client and server to have to be migrated at the same time, in lock-step. I was hoping to be able to upgrade the server first, then allow some period of time before requiring all clients to have to update. What do you think about having a server-side option where you can specify a default server? Using old credential paths then I would specify a default server=legacy, and only set up new credential paths to require a server name. That way old clients could continue to work for a period of time, with a server that supports both styles.

You can actually upgrade the clients first because options that aren't recognized are silently ignored. So if your clients send server=legacy when the plugin doesn't support that option, it won't cause a problem. Then when you upgrade it will work seamlessly. (We did this exact migration path for our API/tooling.)

I see. That's a lot harder migration path for me though, because I have more control over the servers and less control over the clients. There are also a lot fewer servers. I provide the client software, but it's installed by multiple other people. It might take a long time for all the clients to upgrade. It's not nearly as bad now in the current phase of development as it will be in the future since it's not in production use yet, but even now it will be quite a hassle to get all the testers to upgrade their client software before I can upgrade the servers.

Could the server parameter be optional if there's only one server defined and it is called legacy?

Oh I just found I had another use case for this where delayed cleanup isn't sufficient. My configuration code also disabled the whole module if the provider url or client id changes. That's important because all the refresh tokens that are stored will not be valid with a different url or client id. Again, this is something that doesn't happen often, but it can happen.

Interesting. My use case for keeping it would be to rotate client secrets, in which case rewriting the server configuration should have no impact on existing credentials.

That's right, if only the secret changes I don't invalidate all the tokens

The invalidation problem actually goes beyond this, too, and I'm afraid it might be out of the general scope of this plugin... for example an authorization code can seemingly be valid from the plugin's perspective but revoked at the provider (by disconnecting the integration in their website or whatever). It would then be up to the client to understand this and try to either request a new auth code or stop using that credential altogether.

Sure. If my client doesn't succeed at reading a credential it already falls back to re-doing the authorization flow, so that's OK. Originally I put this cleanup code in because the token issuer operator complained that my vault was filling up the logs with failed attempts to frequently refresh an invalid refresh token when I redirected the config to a different server. Now however since I always set tune_refresh_check_interval_seconds=0 it should only fail once the next time it is used, and then get overridden. So maybe it's fine.

Would it be possible for you to scope your credentials by some well-known prefix (e.g. hash of provider URL/client ID) in the plugin so you could have your clients "know" when they need to rotate/write new credentials? I'm afraid there's no atomic way to do this cleanup so you'd always at least have a few instants where stale credentials were still valid if client ID/provider URL changes.

Possibly, but only if that hash were provided by the plugin to the client as part of the api since the client does not know those values. The client would then need to store that hash as well for the next invocation, it would be quite a hassle.

Ok, but I thought there could be multiple credentials stored at the same path for different servers. Is that not the case? Each server's credentials have to be stored in unique paths?

I'm not sure I understand this question. Can you clarify maybe with a code/CLI example of what you're expecting here? > I thought that I would be able to have for example a path of creds/drdaved that would return different values for server server-a and server-b. I thought that you would be able to read & write the same path with different server= parameters and deal with different refresh tokens. If not, I will plan to include the server name in the path so I'd have paths creds/server-a/drdaved and creds/server-b/drdaved and additionally pass server=server-a or server=server=b when writing. I wouldn't need to pass them when reading. That's all cool with me, it's just not how I thought it was supposed to work when I was first reading the documentation.

Understood. I'll see if I can clarify that in the docs.

You can actually upgrade the clients first because options that aren't recognized are silently ignored. So if your clients send server=legacy when the plugin doesn't support that option, it won't cause a problem. Then when you upgrade it will work seamlessly. (We did this exact migration path for our API/tooling.)

I see. That's a lot harder migration path for me though, because I have more control over the servers and less control over the clients. There are also a lot fewer servers. I provide the client software, but it's installed by multiple other people. It might take a long time for all the clients to upgrade. It's not nearly as bad now in the current phase of development as it will be in the future since it's not in production use yet, but even now it will be quite a hassle to get all the testers to upgrade their client software before I can upgrade the servers.

Could the server parameter be optional if there's only one server defined and it is called legacy?

What if we added a default_server to the config?

Oh I just found I had another use case for this where delayed cleanup isn't sufficient. My configuration code also disabled the whole module if the provider url or client id changes. That's important because all the refresh tokens that are stored will not be valid with a different url or client id. Again, this is something that doesn't happen often, but it can happen.

Interesting. My use case for keeping it would be to rotate client secrets, in which case rewriting the server configuration should have no impact on existing credentials.

That's right, if only the secret changes I don't invalidate all the tokens

The invalidation problem actually goes beyond this, too, and I'm afraid it might be out of the general scope of this plugin... for example an authorization code can seemingly be valid from the plugin's perspective but revoked at the provider (by disconnecting the integration in their website or whatever). It would then be up to the client to understand this and try to either request a new auth code or stop using that credential altogether.

Sure. If my client doesn't succeed at reading a credential it already falls back to re-doing the authorization flow, so that's OK. Originally I put this cleanup code in because the token issuer operator complained that my vault was filling up the logs with failed attempts to frequently refresh an invalid refresh token when I redirected the config to a different server. Now however since I always set tune_refresh_check_interval_seconds=0 it should only fail once the next time it is used, and then get overridden. So maybe it's fine.

Would it be possible for you to scope your credentials by some well-known prefix (e.g. hash of provider URL/client ID) in the plugin so you could have your clients "know" when they need to rotate/write new credentials? I'm afraid there's no atomic way to do this cleanup so you'd always at least have a few instants where stale credentials were still valid if client ID/provider URL changes.

Possibly, but only if that hash were provided by the plugin to the client as part of the api since the client does not know those values. The client would then need to store that hash as well for the next invocation, it would be quite a hassle.

I'll try to think if there's an elegant solution to this, but no promises. :-) I see why it's a pain point though.

For now, I've opened these issues to address what we've discussed so far:

You can actually upgrade the clients first because options that aren't recognized are silently ignored. So if your clients send server=legacy when the plugin doesn't support that option, it won't cause a problem. Then when you upgrade it will work seamlessly. (We did this exact migration path for our API/tooling.)

I see. That's a lot harder migration path for me though, because I have more control over the servers and less control over the clients. There are also a lot fewer servers. I provide the client software, but it's installed by multiple other people. It might take a long time for all the clients to upgrade. It's not nearly as bad now in the current phase of development as it will be in the future since it's not in production use yet, but even now it will be quite a hassle to get all the testers to upgrade their client software before I can upgrade the servers. Could the server parameter be optional if there's only one server defined and it is called legacy?

What if we added a default_server to the config?

Yes that would be great. Perhaps it is easiest to add under the "servers" config where one of them can be indicated as default with a true, something like is is_default=true.

If you gave me a hint on how to implement it, I could do that, test it, and submit as a pull request.

...

I'll try to think if there's an elegant solution to this, but no promises. :-) I see why it's a pain point though.

For now, I've opened these issues to address what we've discussed so far:
* [Ensure reaper can delete credentials that no longer have a backing server #61](https://github.com/puppetlabs/vault-plugin-secrets-oauthapp/issues/61)

* [Set a default server to make v2 to v3 upgrades easier #62](https://github.com/puppetlabs/vault-plugin-secrets-oauthapp/issues/62)

Thanks. I will follow up there.

puppetlabs / vault-plugin-secrets-oauthapp

Big problem with new multi-provider interface #60

61

62