porch: live status of deployed package

johnbelamaric commented 2 years ago

When a package has been deployed, we need a way to reflect back to the user the status of that deployment: has the intent been realized or not? We also need this for doing any sort of roll out functionality.

In the Porch cluster, we cannot guarantee access to the API servers of the workload clusters. We must also keep in mind #3255 - easing integration between kpt live apply, Config Sync, and Porch. This suggests that we must decouple the API used to store the status from how that API is populated.

So, first, we need a way to represent the deployment status of the package. I would suggest something compatible with ResourceGroup (if not that directly). The primary concern will be how much data we can safely reflect back into the Porch cluster. We may want to only include the aggregated status. It's something to perhaps model and discuss the scalability aspects.

Next, we need a way to populate that API. We should support multiple mechanisms here. Three possibilities come to mind immediately:

1) In some cases, we will have the ability for a controller running in the Porch cluster to reach into the workload cluster and gather that data. But we cannot guarantee that. And at high scale and frequency this can become pretty demanding, even with watch (for example, could we do 10,000 watches on ResourceGroups in other clusters?).

2) In other cases, it may be possible for workload clusters to reach into the Porch cluster but not vice-versa. In this case an controller running in the workload cluster (or the RG controller itself) could publish this data back to Porch.

3) Finally, in many cases workload clusters will already have a metrics pipeline that is used to transport metrics to a central metrics server. We could expose RG status as a metric, and piggyback on that pipeline. A controller running in the Porch cluster could populate the Porch status API based on queries to the metrics service.

I like 3) for a few reasons. It leverages an existing flow rather than creating our own, and that flow is already designed for higher scale. The controller/service running in the Porch cluster need only talk to the metrics service and the Porch cluster API server, rather than to potentially 1000s of separate clusters in the other two options. Also, in the case of rollout we will need access to additional metrics beyond "intent realization" anyway.

Related issues: #3234, #3255

mortent commented 2 years ago

I agree on the challenges around fetching the status for each cluster at scale. I think using separate storage to separate the pushing of status by workload clusters from pulling status by the central cluster seems like a good way to work around it.

I also think metrics is part of the solution here, since even if we end up with fetching status from Config Sync (or some other git-syncer) through some other means, progressive rollouts almost certainly need evaluation of application-specific metrics to determine if the workload/application is actually healthy.

Whether metrics is a good solution for RG status depends on what level of detail we need I think. Exposing the status of an RSync/RG resource sounds doable, but if we need to know the status of a specific revision of a package it gets more difficult since including that information will probably result in labels with high cardinality. The experience from Config Sync is that users do care about the result of each individual sync operation, while metrics works best when looking at rates rather than each individual measurement.

justinsb commented 2 years ago

Great topic! I agree that a lot of these issues are still somewhat unknown. As such, I propose we continue with prototyping based on assuming connectivity to the target cluster (and the capacity to watch). Then we can more rapidly establish the schema we need and ideally map it to metrics which I agree is a nice way to address the not-connected case.

In terms of what information we need, my hypothesis is:

we only need a red/green on each ResourceGroup (maybe yellow also, but a single datum).
that information is sufficient for coordinating rollouts etc. centrally
we'll have a link so users can "drill down" and this will require connecting to the target cluster

I think this is a good starting point, but it is a hypothesis. In particular for "drill down", if it's hard for users to connect to these clusters this likely won't hold. I think a lot of this depends on why we end up not watching (assuming we don't) - is it scalability of watch, is it the volume of data, is it connectivity etc.

yuwenma commented 2 years ago

@johnbelamaric Do you mind explaining what the "workload cluster" refers to ? I'm not very familiar with the term and it's a little bit challenging to follow the discussions. Does it refer to the other clusters which the porch cluster can talk to (i.e. Porch creates RootSyncSet in a "workload" cluster).

johnbelamaric commented 2 years ago

Yes, exactly. The Porch system is in a management cluster and the workloads run in other clusters and are configured with their workloads via ConfigSync.

While it’s also possible to put all these things into one cluster, my primary mental model uses this structure. In the Nephio or retail context, those may also be called “edge” clusters. But “workload cluster” is more general.

hkassaei commented 2 years ago

Maybe this problem can be divided into two parts: 1) the status of the workload cluster itself 2) the status of the workloads running in the workload cluster

For (1), can't we rely on whatever southbound provisioning API (Crossplane, ACK, GCC, etc.) to populate a status field in the Cluster custom resource to reflect the status of the cluster? I think we can safely assume that the cluster provisioner has access the cloud provider APIs that do provide the status of the workload cluster (both the control plane and worker nodes pool). And the cluster provisioner runs in the management cluster, so any other controller in the management cluster that needs this information can read it from the Cluster CR. For example, a controller that needs to deploy a workload can wait for the workload cluster status to be ready/running before trying to deploy the workload.

For (2), since the workloads in the worker cluster are actually deployed by a GitOps agent, I agree it might make sense to use other indirect approaches such as relying on metrics or health indicators collected from those workloads and stored in a central metric server.

johnbelamaric commented 2 years ago

For 1) first, let me say that in general, yes, conceptually what you are saying can work. This does require though a component to read the "provisioner" status and write it to the Nephio cluster resource. There are a few other subtleties too (some of which we may be able to side step in, say, v1.0 of Nephio):

Cluster CR is a Nephio concept, not something we have a the Porch layer at this point, so at least this issue is looking to resolve it without the Nephio details.
A "cluster" in the southbound provisioning API may have several aspects related to it that are represented by varied K8s resources. For example, with GKE we have a "cluster" resource and we have "nodepool" resources. In our model, these would all be contained in a single kpt package that gets deployed. Thus, we can use the ResourceGroup status, which aggregates the per-resource statuses, to determine the ready state of the entire bundle of resources the represent a logical "cluster". This should be equivalent to the Nephio Cluster CR. Of course, if the Nephio Cluster CR is included in the package as discussed, then we have a circular relationship (since the Cluster will be part of the RG, unless there is a way to exclude it). I'm not sure we need to set Ready in the Cluster, if we always have a ResourceGroup.
In the early Nephio diagrams, you will see that the southbound provisioning API actually lives in a separate cluster. This is not necessary, and probably won't be the case much of the time. But our solution should work in this case. There are a couple reasons to do this: 1) These infra provisioning KRM controllers have a LOT of CRDs, which are unfortunately always cluster-scoped objects. So we can actually run into CRD scaling issues with too many in the same cluster; 2) IAM for controllers that need to access cloud APIs is easier to manage when the cluster those controllers are in is running in the same cloud. That is, you can use cloud provider IAM integrations if you control GCP with controllers running in GKE and AWS with controllers running in EKS. So, from a management and security perspective, having a cloud management infrastructure controller per cloud provider may be better.

For 2), I think "yes and no". The "no" part is that, just to repeat, I think that a package deployed to a cloud management cluster looks identical to Porch/Nephio as a package deployed to a workload cluster, from the point of view of status collection. That is, Porch at least shouldn't generally have to know what's "inside" a package. Since Nephio inherently will have the concept of Cluster there it may make more sense to have a distinction.

Now, that said, we are talking about capturing dependencies (#3448). In that case, we could say that a given workload package "is hosted by" a cluster package, even though those packages live in different repositories. This is how we start to build the capability to manage sequencing.

kptdev / kpt

porch: live status of deployed package #3543