Closed ruanxin closed 3 months ago
Some feedback:
operator.kyma-project.io/kyma-name
label?spec.channel
for? That resource should be independent of the channel.Some feedback:
- We need some identifier of SKR. Is it
operator.kyma-project.io/kyma-name
label?
Depends on the purpose of the identifier, if you want to indicate something like SKR shoot name, then it's not. This label is the Kyma CR name, which is one-to-one mapped with SKR cluster, it's also used for LM to identify the SKR cluster kubeconfig secret.
In current Kyma CR, we have following labels for indicate SKR cluster, but they are not create or used by LM, if you need similar label used for other services, we can also add them.
labels:
kyma-project.io/runtime-id: 6ca8a6ea-37b3-47f4-a2c8-fe1b6013db23
kyma-project.io/shoot-name: f1b6307
kyma-project.io/subaccount-id: 19ea2000-865d-42fd-8d6f-3e3e2dcce3c8
- What is
spec.channel
for? That resource should be independent of the channel.
The purpose of this channel
is just want to consistent with module template. For instance, do we have the use case that some module introduces this kind of resources in new version, and this version first released in fast
channel, then released to regular
channel. In this case, this resource should have an identifier for channel, or it's completely independent?
@ruanxin How would the submission process of manifest content look like in case of mandatory modules that need to synced across all remote runtimes?
@ruanxin If the resource manifests are removed from the content of Sync Object, would lifecycle manager remove the resources on the remote clusters?
@janmedrek @ruanxin Lets clarify if the mandatory module case would fit into the implementation of the KCP Sync Object
@ruanxin How would the submission process of manifest content look like in case of mandatory modules that need to synced across all remote runtimes?
Currently, in my mind, this idea is more suitable for serving some dynamic content which based on each individual skr cluster, for example, some resources contain skr specific data. So it requires a component in KCP which can generate this resource on the fly, LM just take responsibility to sync it. But for your case, if all resources are identical, it's more relavent to put it in the module template. If you just submit it once, there is no such component recreate it when module disabled and enabled again.
@ruanxin If the resource manifests are removed from the content of Sync Object, would lifecycle manager remove the resources on the remote clusters?
Yes, LM take care the synchronize of this object.
Details about: "KEB installs secret for BTP service operator:
1) KEB receives credentials from ERS during Kyma provisioning or update 2) KEB applies a secret with credentials from ERS + generated by KEB clusterID. Example 3) some KEB k8s job manage that secret. That means once a day it checks if all SKRs have a secret in, if not apply it again.
Hi @PK85,
KEB applies a secret with credentials from ERS + generated by KEB clusterID. Example
What was the contract data from KEB to ERS to fetch this credential? is it something you can get from current Kyma CR? we have some labels defined in Kyma CR contains some data, e.g:
labels:
kyma-project.io/broker-plan-id: xxx
kyma-project.io/broker-plan-name: azure
kyma-project.io/global-account-id: xxx
kyma-project.io/instance-id: xxx
kyma-project.io/platform-region: cf-eu20
kyma-project.io/region: northeurope
kyma-project.io/runtime-id: xxx
kyma-project.io/shoot-name: xxx
kyma-project.io/subaccount-id: ""
are they contain the required data?
KEB is not fetching credentials, KEB receives credentials when ERS is calling KEB to spin a Kyma cluster. Nope, those labels are not related to those credentials.
Updated Additional comment: KEB has those credentials in the DB. With your syncer I understand it will require creating them as secrets in the KCP itself, somehow not needed from the KEB point of view
@ruanxin, Based on our understanding, the Kyma CR implementation will also be enhanced to include the contents of Module CR
spec in the status. We had following questions related to SyncResource CR
and Kyma CR
changes:
resource.spec
tracked only by the changes to the keys (resource file name) or by the changes to the contents as well? Answers to below questions may help:
4a. If the KCP Module Manager updates the SyncResource CR and removes the secret.yaml
from it, will the secret some-credential
be deleted from SKR?
4b. If the KCP Module Manager updates the SyncResource CR and changes the secret name to some-credential-1
, will the secret some-credential
be deleted and some-credential-1
be created?How LM will learn which Module CR(s) to be monitored and included in Kyma CR.status?
This is based on Manifest CR which parsed by ModuleTemplate, each module provide the Module CR in the spec.data
field, you can check keda-manager's module template as an example
Will the Module CR spec in the KymaCR.status be single object containing the last modified CR or a list containing all the changes?
Current LM only support deploy a single Module CR for each module, and we don't plan to introduce multiple Module CR if there is no enough reason for it. 2a. If it is a single object, is there a possibility of losing concurrent updates to multiple watched CRs in SKR? 2b. If it is a list, how and when the objects will get removed from the list? 2c. If they don’t get removed from the list, should the KCP modules maintain some kind of state to track what has changed from previous reconcile?
Can the SyncResource be used to update the contents of the Module CR in SKR?
It depends on your use case, LM only take care deploy the resources contains in the SyncResource.
Assuming the SyncResource example above is already deployed, is the changes to resource.spec tracked only by the changes to the keys (resource file name) or by the changes to the contents as well? Answers to below questions may help: 4a. If the KCP Module Manager updates the SyncResource CR and removes the secret.yaml from it, will the secret some-credential be deleted from SKR? 4b. If the KCP Module Manager updates the SyncResource CR and changes the secret name to some-credential-1, will the secret some-credential be deleted and some-credential-1 be created?
Good question, the change should be tracked in the SKR, the answer to both questions is yes, but it means we need to introduce some fields in SyncResource to tracking old resources, we actually have a solution in Manifest CR to tracking synced resources, will introduce this concept to SyncResource also, I extend the status.synced
concept in the description.
In your example you outline that transporting a secret is a common use case. The kind Secret is special in kubernetes in regards on how the data gets stored in the APIServer. Embedding a secret inside a CR will not fullfil these requirements coming with secrets, as the APIServer will not treat the data special anymore. So it must be checked if the solution is acceptable for secret data at all, or what kind of limitations that will imply. Maybe the secret data can be stored as a real secret in KCP and just referenced by the SyncResource? Is it really needed to embed the data?
As discussed today the sample scenario can look like this.
apiVersion: operator.kyma-project.io/v1alpha1
kind: Infrastructure
metadata:
name: default
namespace: kyma-system
spec:
vpcPeerings:
- name: peering-1
description: peering-1
remoteVpcId: vpc-1
remoteRegion: eu-central-1
remoteCidrRange:
- name: peering-2
description: peering-2
remoteVpcId: vpc-2
remoteRegion: eu-central-1
remoteCidrRange:
nfsVolumes:
- name: nfs-1
description: nfs-1
size: 100Gi
- name: nfs-2
description: nfs-2
size: 100Gi
The Kyma Lifecycle Manager will include the Infrastructure CR in the Kyma CR status in the Kyma Control Plane.
Infrastructure Manager controllers deployed in the KCP are watching for the Kyma CR status changes and create the VPC peering and NFS volumes custom resources in the Kyma Control Plane. These resources will be reconciled and will create the corresponding resources in the cloud provider account. As a result, SyncResource CRs will be created to create the corresponding resources in the SKR cluster (storage class, persistent volume etc).
apiVersion: operator.kyma-project.io/v1alpha1
kind: SyncResource
metadata:
namespace: nfs-manager
name: kyma-1233445345345
spec:
kyma: kyma-1233445345345
resource:
- strategy: CreateAndSync
name: nfs-1-pv
resource: |
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: nfs.csi.k8s.io
name: pv-nfs
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-csi
mountOptions:
- nfsvers=3
csi:
driver: nfs.csi.k8s.io
readOnly: false
# volumeHandle format: {nfs-server-address}#{sub-dir-name}#{share-name}
# make sure this value is unique for every share in the cluster
volumeHandle: 10.17.65.170#ssd#pvc1##
volumeAttributes:
server: 10.17.65.170
share: /ssd
- strategy: CreateAndSync
name: nfs-1-pvc
resource: |
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-nfs-static
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
volumeName: pv-nfs
storageClassName: nfs-csi
- strategy: CreateAndSync
name: nfs-1-sc
resource: |
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-csi
provisioner: nfs.csi.k8s.io
parameters:
server: 10.17.65.170
share: /ssd
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=3
When Infrastructure CR is deleted, the corresponding VPC peering and NFS volumes custom resources will be deleted. The SyncResource CRs will be updated, and the corresponding resources in the SKR cluster will be deleted:
apiVersion: operator.kyma-project.io/v1alpha1
kind: SyncResource
metadata:
namespace: nfs-manager
name: kyma-1233445345345
spec:
kyma: kyma-1233445345345
resource:
- strategy: Delete
name: nfs-1-pv
resource: |
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nfs
- strategy: Delete
name: nfs-1-pvc
resource: |
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-nfs-static
- strategy: Delete
name: nfs-1-sc
resource: |
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-csi
@ruanxin Some additional aspect that came from discussion with the interested teams:
There is a case where module does not have an active component in the SKR. The manager is running in the KCP and wants to set the status of the module CR. The status is a subresource in kubernetes so it can be updated independently from spec, so I think it would be nice to provide a syntax to update status subresource.
Hi @pbochynski ,
There is a case where module does not have an active component in the SKR. The manager is running in the KCP and wants to set the status of the module CR. The status is a subresource in kubernetes so it can be updated independently from spec, so I think it would be nice to provide a syntax to update status subresource.
Is the usage just for updating Module CR status? or it need to support all resources subresource? The thing is status
is just a common name for subresource, but kubernate allows to define subresource with any name.
But the SyncResource definition can be extended as follows:
resources:
- strategy: CreateAndSync
name: module-cr
resource: |
apiVersion: operator.kyma-project.io/v1alpha1
kind: SomeModule
metadata:
name: module-cr-name
namespace: kyma-system
spec:
...
subresource: |
apiVersion: operator.kyma-project.io/v1alpha1
kind: SomeModule
metadata:
name: module-cr-name
namespace: kyma-system
status:
...
state: Ready
One thing I want to address here is if we allow update subresources, it really needs the Module team to take care of the frequency of updating the subresources, it should prevent updating SyncResource if the status is not changed, otherwise, it will significantly increase the amount of traffic for sync module status.
Hi @pbochynski @ruanxin Have made a protocol elaborate doc on the Cloud Resources use case https://github.tools.sap/I517584/kyma-docs/tree/main/concepts/cloud-resources-manager/protocol
In the summary the KCP Sync Resources:
CreateAndSync
.CreateAndSync
to Delete
.Delete
strategy should call SKR API to delete the resource and watch until it's gone - deleted for real and remove the orphaned/deleted SyncResource item.UpdateAndDelete
sync strategy that keeps content in sync, but once destination resource is marked for deletion the sync is stopped, and once destination resource is really gone and deleted, the orphaned SyncResource item is removed.Eventually, the Sync feature seems a bit stretched to cover the Cloud Resources use cases. Considering the effort and delivery time, maybe we should relax the requirements, simply the protocol, and search for alternative approach to cover Cloud Resources use cases. If it stands as a possibility I could elaborate it more.
Hi @tmilos77,
Thanks for the feedback, regarding this UpdateAndDelete
strategy, I can add it to the definition.
But I need to clarify this definition a bit,
Create
strategies, otherwise will bring conflict.Update
responsible from this strategy, do LM really need to do Delete
? what does this AndDelete
mean here?And I also want to make it clear by current design, LM only treats SyncResource.spec
as read-only, it will not update the spec
, only ModuleManager take care of remove/update entries to it, and the indicates to remove the entry (or the complete SyncResource) can get from status
field.
Since we know have the Acceptance Criteria to not have sensitive data inside of the SyncResource.spec.resource
resources, I would like to bring up the idea of having a configurable list of non-allowed resources for the corresponding Reconcile loop in the lifecycle-manager.
How could this look like:
--restrictSync "v1/Secrets"
SyncResource.spec.resource
list and check the apiVersion
+ kind
apiVersion
+ kind
match one of the restricted GVKs, the resource does not get synced to the SKR clusterWarning
in the SyncResource
CRThus, we can actively restrict specific resource to be synced and exposed/store sensitive data.
We could also just document which resources are not allowed to be synced, but past shows to be more strict about such features.
@ruanxin I wasn't much concentrated on the SyncResource as much I was on the CloudResources flow. My point was to seek for a way to simplify the flow and not to have to watch yet another resource, SyncResources
, to clean up orphaned (deleted) items. My idea was that preferably it would function as helm - removing an item would delete it from the SKR, w/out a need to prior marking item for deleting, then checking if it was deleted, and eventually removing the item from the chart/SyncResource. I understand that requires an additional state that will be compared with SyncResource's state. Have thought afterwards a bit more on how it could be achieved in the LM and tried with using helm lib. That would solve the removal but it has other drawbacks like updating an existing resource not created by helm. So, to keep it simple, I agree that this SyncResource feature should not automatically change the spec, treating it as read-only as you said, but interested parties will behave responsible and watch it's status and remove orphaned items on it's own, hopefully w/out bugs, removing the noise on LM and improving it's performance. Still some monitoring possibility would be great to detect orphaned items. I might seem pressing on the orphaned items aspect but predict that dynamic nature in the CloudResources implementation.
On the other hand, after these additional considerations, I'm unable to understand the need to distinguish create and update operations. If I'm wrong please correct me, but wouldn't server side apply create new resource if it doesn't exist, and just patch the existing object? If that stands, wouldn't just a simple Sync
strategy be sufficient?
Following the automatic removal of deleted items considerations, then also CreateAndIgnore
strategy seems as an overspill, and it's insurance/implementation can be transferred to the clients watching the SyncResource status and just removing that item once it's been created, same way as Delete
responsibility would be shared.
Having all that, I would say two strategies are enough: Sync and Delete.
In the base line, just regarding the CloudResources functionality, following the mentioned option 2 with a helper Binding
resource that will be synced, the only important thing in SyncResource functionality is to be able to change strategy of an item, from CreateAndSync
(or as now proposed just Sync
), to Delete
, with additional reconciliation on it's status and eventual removal from the SyncResource
.
@jeremyharisch I have no direct interests atm for it regarding the CloudResources, but since already included in the SyncResource proposal discussion... I missed the decision reasons and explanation on sensitive data, is there some record on it? I'm not particularly convinced that preventing certain resource kinds would solve the issue. One can always bypass such constraints and transfer the sensitive data trough other kinds, ie instead in a Secret in a ConfigMap, or even as a Deployment env var.
On the other hand, not sure what encryption at rest has to do with this SyncResource feature and it doesn't have with for example v1/Secret
resources we already have stored in KCP, or a db password clients have in SKR. It's exposed as readable to k8s principals having appropriate roles, and it's up to the underlaying infrastructure to ensure encryption at rest on the physical levels to prevent it's leakage trough other channels. Isn't the sensitive data situation same with SyncResource
kind in KCP as it is with v1/Secret
? That doesn't prevent us from using Secrets.
@ruanxin I wasn't much concentrated on the SyncResource as much as I was on the CloudResources flow. My point was to seek for a way to simplify the flow and not to have to watch yet another resource,
SyncResources
, to clean up orphaned (deleted) items. My idea was that preferably it would function as helm - removing an item would delete it from the SKR, w/out a need to prior marking item for deleting, then checking if it was deleted, and eventually removing the item from the chart/SyncResource. I understand that requires an additional state that will be compared with SyncResource's state. Have thought afterwards a bit more on how it could be achieved in the LM and tried with using helm lib. That would solve the removal but it has other drawbacks like updating an existing resource not created by helm. So, to keep it simple, I agree that this SyncResource feature should not automatically change the spec, treating it as read-only as you said, but interested parties will behave responsible and watch it's status and remove orphaned items on it's own, hopefully w/out bugs, removing the noise on LM and improving it's performance. Still some monitoring possibility would be great to detect orphaned items. I might seem pressing on the orphaned items aspect but predict that dynamic nature in the CloudResources implementation.
Hi @tmilos77, yes, I agree that has resources synced automatically handled by LM will be easier for module manager, actually I introduced a synced list in status field in my first draft to tracking the diff between resources, but this concept challenged by @pbochynski, it introduced complexity to LM, also may bring inconsistency if the status messed up. That's why we bring this Delete
strategy, but you don't have to implement a very consistent solution, basically the purpose of this strategy is just to let LM delete some outdated resources, LM will not report error if the configured resources with Delete
strategy not found in cluster.
On the other hand, after these additional considerations, I'm unable to understand the need to distinguish create and update operations. If I'm wrong please correct me, but wouldn't server side apply create new resource if it doesn't exist, and just patch the existing object? If that stands, wouldn't just a simple
Sync
strategy be sufficient?
That's the thing I need to verify with you by your use case. As I said in the previous comment, do you use this strategy for only update the existing resource or not? If you have such use case, then introducing this Update
make sense, we can distinguish it with Create
, as you said, the server-side apply will create new resource if it doesn't exists, so if some resource should not be recreated after it's removed, then it make sense we add this new UpdateAndDelete
strategy.
Following the automatic removal of deleted items considerations, then also
CreateAndIgnore
strategy seems as an overspill, and it's insurance/implementation can be transferred to the clients watching the SyncResource status and just removing that item once it's been created, same way asDelete
responsibility would be shared.
CreateAndIgnore
may not required for your operator, but it fit for other scenarios, this sync resource concept is not only designed for your use case.
Created on 2023-10-30 by Xin Ruan (@ruanxin)
Decision log
Context
Several modules need to create some resources in managed Kyma Runtimes based on the control plane contexts. However, in order to avoid introducing more central components to directly connect to remote clusters, optimize the number of connections, and maintain the principle of least privilege, where only a few central components have admin permissions to access remote clusters, we should provide a way for relies on one centralized component to push those resources to Kyma Runtimes.
Decision
SyncResource
, the data encryption at rest solution should be further designed as a prerequisite.KCP SyncResource example:
KCP SyncResource explain:
operator.kyma-project.io/kyma-name
label enables the lifecycle manager to identify the target Kyma Runtime for deploying the resource.spec.kyma
mandatory, specifies the target Kyma name for deploying this resource.spec.items
mandatory, specifies a list of resources to be syncedstrategy
mandatory, defines the sync strategyCreateAndSync
: the resources designed as immutable resources for SKR cluster, any modification to the resources by users should be reset.Delete
: the resources previously deployed in SKR cluster, but should be removed now, this is useful for the scenario to cleanup some old resources.CreateAndIgnore
: the resources designed as mutable resources for SKR cluster, LM only take care of creating this resource, and recreate it if deleted by users, but any update to the deployed resources in SKR cluster will be ignored.name
: mandatory, the name of this resource, not necessary same as actual resource name deployed in SKR cluster.resource
optional, contains the resource YAML content as plain text.subresource
optional, contains the subresource YAML content as plain text.status.conditions
andstatus.lastOperation
indicate the synchronization status and provide information about the last operation performed by the lifecycle manager.status.state
reflects the current state of the sync resource.status.items
indicates the status of the current list of Resources been synced.Consequences
SyncResource
) for defining resources as Kubernetes manifests.Risks:
SyncResource
could lead to synchronization problems or resource mismanagement.