Closed spjmurray closed 5 months ago
Of note, for once CAPO just worked, I'm guessing it must just read the Secret on each reconcile.
Note, the PoC code above is a little naive in that it creates a new client for each request. We could be more clever by doing a reflect.DeepEqual on config (which as we all know is flaky as hell given go's nil vs empty semantics), or we could marshal and do a string compare or something. I really don't see much of a performance benefit from this over-engineering though 🤷🏻 Prove me wrong!
change a password in OpenStack, any associated application credentials stop working
Are you sure about this? Can you find an evidence that openstack changes the app creds hash according to an owner password?
No I'm not, after a test, seems my boss lied to/misled me 😞 Both CSI and CAPO autoscaling appear to work.
That said, we have had a number of OpenStack upgrades over the past few months, perhaps it was a bug that got fixed between Xena and Zed 🤷🏻
Irrespective, it would be nice still to have short lived application credentials so that and exposure of the secret is time limited. Obviously the situation is far more serious if you are using a username/password, but that's your own fault!
Or perhaps with hind sight, he was using username/password at the time...
Incidentally, CAPO is different in this regard: we can't cache these credentials as they can be different for each cluster. CPO is only running on a single OpenStack, so it's a reasonable and sensible thing to do here. If wasn't too complicated we would probably cache them in CAPO too, tbh.
So one prime example I'm thinking of why not to cache, or at least poll for changes is:
It's not unreasonable to have those changes picked up, rather than me -- as a service provider -- having to restart a bunch of services manually or with a boat load of custom logic, right?
Anecdotally @mdbooth caching would allow you to delete things in any order 😸
Actually, now I've thought, CAPO will use caching, in the sense that's how controller-runtime works, it just keeps things up to date with a watch under the covers.
I did consider using a similar approach here, where you can reinitialize the client with an informer on secret change. However... I see -- for security purposes undoubtedly -- the secret can also come from a hostpath mount, which pretty much kills off that idea.
But not all is lost, as I said the PoC code is very naive. Perhaps a go routine to poll the files on disk, when it sees a change it can atomically update the client, best of both worlds?
How do you deploy CPO? IMO using a secret hash to trigger a new helm deployment should be enough. If you're using hostpath, where the secrets come from into the hostpath?
I see where you are going with this, and it's not that easy...
So we're creating the cloud config with https://github.com/kubernetes/cloud-provider-openstack/blob/master/charts/openstack-cloud-controller-manager/templates/_helpers.tpl#L51, then we reuse the secret defined there with the CSI plugin , so never actually have a full cloud config available within the CSI module to actually hash.
I guess if push comes to shove I could recreate this generation logic in our provisioner and just pass in raw data to both the OCM and OCP, that would make it available where it's actually needed.
I do worry though, philosophically, that using the annotation hack in 1000s of different projects around the world, when you could just poll for changes and act accordingly is the right thing to do. Opinionated, but 🤷🏻
I don't think that detecting changes in a file mounted in a pod is reliable, in general. The documentation says it will eventually be updated, but we'd probably have to poll it. I don't think we can rely on something like fsnotify because the mechanism used by the cri to expose it to the pod is an opaque implementation detail, and may work, work differently, or not work at all dependent on unknown factors.
If we were going to do this in the library, I believe a correct implementation could be:
We should still recommend using orchestration to restart the controller, but this could be a backup.
Current gophercloud reauth implementation tries to reauth indefinitely. If we implement a provider.RetryFunc
and catch 401, we can perform an action, e.g.:
Current gophercloud reauth implementation tries to reauth indefinitely. If we implement a
provider.RetryFunc
and catch 401, we can perform an action, e.g.:
- interrupt a process: k8s should restart a pod with new credentials
- dynamically re-read credentials and reauth a client
The problem with this is the 'user surprise' I mentioned in the PR. As a user, if I rotate credentials I expect to see any problem with them immediately. It would be surprising, and make rollback harder, if the user didn't discover that the new credentials were invalid until the old credentials were removed.
The problem with this is the 'user surprise' I mentioned in the PR.
In order to eliminate the "surprise" factor, the deployment must be updated with new credentials hash.
Create a goroutine to poll the file periodically and store a hash
It makes more sense to create a k8s client and watch for secrets update.
The problem with this is the 'user surprise' I mentioned in the PR.
In order to eliminate the "surprise" factor, the deployment must be updated with new credentials hash.
Create a goroutine to poll the file periodically and store a hash
It makes more sense to create a k8s client and watch for secrets update.
That's also what we do in OpenShift, and I think we'd be unlikely to use this feature if it merged. However, I also appreciate that just disabling the cache is simpler and may suit some use cases better.
I've updated https://github.com/kubernetes/cloud-provider-openstack/pull/2285 with another stab at PoC code. See what you think, state your mind, I know you will!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
Is this a BUG REPORT or FEATURE REQUEST?:
What happened:
When you change a password in OpenStack, any associated application credentials stop working. So some times it's necessary (to be fair, good practice) to rotate the cloud config. Sadly the CSI plugin (and I'll hazard a guess the rest of them), caches the old value, rather than going to the file system each time to check for an update, so it ceases working.
What you expected to happen:
Everything should expect the cloud config to change, pick up new versions and keep working.
How to reproduce it:
Replace the cloud config with a new one, deleting the old application credential. Provision a PVC and watch it complain the application credential is gone. Kick the pod and will suddenly start working again.
Anything else we need to know?:
I guess we could trick things into a reload by modifying the daemonset in some benign way, but it's sub optimal, and requires every client to remember to do this.
Also I'm up for doing the work, if the discussion goes well, seem easy enough on the surface. Famous last words!
Environment: