rancher / aks-operator

Azure Kubernetes Service operator for Rancher
Apache License 2.0
9 stars 30 forks source link

[SURE-7333] Downstream AKS `Cluster Management > Cluster details` page not displaying accurate message during update #179

Open Josh-Diamond opened 1 year ago

Josh-Diamond commented 1 year ago

SURE-7333

Setup

Describe the bug

To Reproduce

  1. Fresh install of 2.7.0
  2. Provision a downstream AKS cluster
  3. Update downstream AKS cluster
  4. Navigate to Cluster Management > Cluster Details page
  5. Reproduced

Result

Expected Result

Screenshots

Screen Shot 2023-03-06 at 6 10 24 PM
kwwii commented 1 year ago

@Josh-Diamond What extra information is provided by displaying the non-formatted text suggested?

Josh-Diamond commented 1 year ago

@kwwii the non-formated text suggestion came from AKS Operator logs, and instead of displaying, This resource is currently in a transitioning state, but there isn't a detailed message available, my request is to replace this message w/ the more specific log coming from AKS Operator, to read Waiting for cluster [c-REDACTED] to update node pool [REDACTED].

w.r.t. What extra information is provided by displaying the non-formatted text suggested?

The specific node pool being updated. If we have access to more specific details, we should pass them along to user w/o them having to navigate off Cluster Details page. From QA perspective this is a bug.

kwwii commented 1 year ago

Thanks, I agree that we should give the user as much information as possible

gaktive commented 1 year ago

This should be transferred to backend to provide the information via the operator.

mjura commented 1 year ago

I will look on this

mjura commented 1 year ago

It is needs to be discussed with UI team,

kkaempf commented 1 year ago

@Josh-Diamond is there a respective JIRA issue ?

@gaktive should I create a dashboard issue or will you ?

gaktive commented 1 year ago

@kkaempf You have the instant context so I'll allow you to file :)

richard-cox commented 1 year ago

@kkaempf @gaktive I don't think this is one for the UI team (https://github.com/rancher/dashboard/issues/8877#issuecomment-1547496221). Whoever owns the provisioning.cattle.io.cluster metadata.state.message should take this on

salasberryfin commented 1 year ago

Hi @richard-cox. @mjura and I investigated how to implement a proper messaging strategy during update and we think using the aksclusterconfigs.aks.cattle.io object would be the most sensible solution. We are already using this resource to write error messages and it is equivalent to what we need to implement for non-error messages. Is there a reason for using provisioning.cattle.io.cluster instead? This would add complexity to the operator messaging, would mean using different strategies for different types of messages and adds new dependencies to the code.

Would it be possible to consider moving to aksclusterconfigs.aks.cattle.io?

richard-cox commented 1 year ago

The context of the cluster is the prov cluster resource itself. So when viewing a prov cluster or, importantly a list of prov clusters it's status comes from itself rather than a secondary resource.

We've hit a lot of issues in the past with secondary resources, specifically how they affect scaling. They will also break the new pagination process coming in soon.

salasberryfin commented 1 year ago

Hi @richard-cox, error messages are currently being updated using this secondary resource aksclusterconfigs.aks.cattle.io (same for the other hosted providers: EKS and GKE). Using this for information messages would align with what's being used in the operator code and would make it easily extensible to other hosted providers. Are there other controllers that are using provisioning.cattle.io.cluster for updating messages?

richard-cox commented 1 year ago

I'm not sure about controllers, however all clusters that are shown in the UI look at the provisioning cluster for state and state message.

gaktive commented 1 year ago

Putting status/ui-blocked on this since this ties into https://github.com/rancher/dashboard/issues/8877 and the label is in use on rancher/rancher for tracking UI issues.

kkaempf commented 10 months ago

Despite this comment UI team believes that this must be fixed in aks-operator.

salasberryfin commented 10 months ago

After discussing what's the best way to tackle this with @mjura, we consider this should wait to have the new Rancher API available to be implemented. As we don't know when it will be available yet, we think it is best to move this back to Q2, considering it is not a high priority issue.

jakefhyde commented 2 months ago

When you create a hosted cluster (e.g. AKS), although the cluster.provisioning.cattle.io does get created, the cluster.management.cattle.io is the tertiary object, and appears to be what the UI displays statuses from. Based on my testing, when creating on of these clusters, the Updated condition is not present on the provisioning cluster object, which leads me to believe the UI is pulling from the management cluster object; this is also reinforced by the fact that the management cluster object is one manipulated during cluster creation/update. Looking in the API, I see the following for the management cluster object:

"state": {
  "error": false,
  "message": "",
  "name": "provisioning",
  "transitioning": true
},

and the following for the provisioning cluster object:

state": {
  "error": false,
  "message": "Waiting for API to be available",
  "name": "waiting",
  "transitioning": true
},

With the following showing on the UI: image

I think in this case, it seems like the correct path for the hosted team would be to set the corresponding condition on the management cluster, be it Provisioned or Updated depending on whether the cluster has been Provisioned successfully yet, since the message of the condition is ultimately what drives the metadata.state.message from the v1 API. @richard-cox feel free to correct me if any of the above information is incorrect. I may be a bit fuzzy regarding how the UI pulls metadata.state.message when relationships are involved in the v1 API.

richard-cox commented 2 months ago

Good spot @jakefhyde. I've investigated some more, interestingly it's different given the type of cluster

For tracking

gaktive commented 2 months ago

cc @nwmac

@kkaempf since this isn't going to be in 2.9.1, should we change the milestone to the next minor version 2.10.0 and then assess? @mjura is chiming in that the CRD used to source messages needs to be changed.

kkaempf commented 2 months ago

Since @mjura is already on it, I moved it to 2.9-Next1

Michal, feel free to move to a different milestone as you see fit 😉

nwmac commented 2 months ago

@richard-cox The fact we use mgmt cluster is most likely historical - I thought we would prefer to move towards using the provisioning cluster where possible and ultimately remove the need for the UI to request the mgmt cluster. Would it not be better for the hosted operators to update the provisioning cluster and for the UI to be updated to look there?

mjura commented 2 months ago

Since @mjura is already on it, I moved it to 2.9-Next1

We talked with Gary and I moved it to v2.10, due it will be needed UI help, it this issue has low priority

gaktive commented 1 day ago

@richard-cox @mjura & @jakefhyde can you sync on this and determine the next step to take?

@nwmac to follow up with @mattfarina on the nomenclature used for what a cluster here is in this context, since that would drive some of the UI around here.