Open hwo-wd opened 2 months ago
hey @hwo-wd - I don't need the secret itself, but can you provide me with a YAML representation (names and relevant metadata) of the situation you are encountering here? Having a bit more context could be very helpful here to get us up to speed without replication.
Also if you are a paying Rancher subscriber, please open a support case and provide your Support Engineer reference to this issue. They can create an internal issue to mirror it and which can help expedite the investigation and resolution of issues like this. As your Support Engineer can securely collect more specific details and provide them to us.
Thanks for coming back to me, Dan.
Sorry for being unclear, I'll to elaborate my situation a bit more:
I have a setup using flux cd to create a new cluster from scratch: basically I'm creating two custom resources: (1) provisioning.cattle.io/Cluster
and (2) rke-machine-config.cattle.io/VmwarevsphereConfig
which makes Rancher kicking in and provisioning the cluster. The thing is, in order to support multiple clusters, I'm provisioning each cluster in it's own namespace, thus allowing for easy separation w/o having to deal with naming conflicts etc. (since ServiceAccount
s etc. come into play, too)
From what I've seen it's not possible to have a separate namespace per cluster via the UI: there is no namespace selector and each cluster gets implicitly provisioned in fleet-default
.
Anyway, everything is working nicely, until it comes to backup scenario: the essential secrets, shown in the following screen shots are NOT backed up by the default ResourceSet
since they don't reside in fleet-default
, but e.g. in namespace spongebob
instead; the fix is easy enough: #575 and makes the restore procedure working like a charm.
@hwo-wd - I think that for the time being you should feel comfortable making modifications to your Rancher Backups ResourceSet
to work around this difficulty. Editing this is acceptable for workarounds of this nature and when users seek to backup resources not created directly by Rancher. So this falls under a little bit of both given your use case with flux.
Our team will triage the issue further and likely investigate it as part of our on going effort to audit and improve fleet related integrations. While the fix you propose seems easy enough, for other users it could have unintended affects.
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions.
Rancher Server Setup
104.0.1+up5.0.1
Describe the bug Creating a provisioning.v2 cluster (e.g., via gitops) in a namespace different than
fleet-default
, creating a backup, pruning any Rancher resources and then restoring leads to said cluster being in an irrecoverably (?) state:To Reproduce Steps to reproduce the behavior:
fleet-default
namespace and let it be provisioned using CAPIkubectl apply -f https://raw.githubusercontent.com/rancher/rancher-cleanup/main/deploy/rancher-cleanup.yaml
; note that this will delete themachine-plan
secret, even-though it resides in a non-Rancher-default namespace2.
above-machine-plan$
secret not residing in thefleet-default
namespace.A possible fix, which is tough to maintain over time until #487 gets a thing, is to broaden the backup of the
machine-plan
by creating a new ResourceSet and enhancing the existing one by the following:This way, the important
machine-plan
secret is part of the backup and gets restored, thus the downstream cluster system agent can connect just fine.Expected behavior
machine-plan
secrets are essential and should be backed up independent of the namespace they reside inNote: I'd be happy to contribute a PR, I just don't know whether the
namespaceRegexp: "^.*"
might be too generic in your taste, albeit the resource selectors are still quite specific