vmware-tanzu / community-edition

VMware Tanzu Community Edition is no longer an actively maintained project. Code is available for historical purposes only.
https://tanzucommunityedition.io/
Apache License 2.0
1.34k stars 308 forks source link

Improve visibility into management and workload cluster creation #2730

Closed stmcginnis closed 1 year ago

stmcginnis commented 2 years ago

Abstract

Our current bootstrapping process emits a lot of output, but it's very hard to follow what is happening. Even if you understand the overall process of deployment, parsing the output can be confusing, and it's not very clear where in the deployment process we are.

When deployment takes awhile, it's hard to tell if things are locked up or if there's something happening under the covers that we are waiting to complete.

When the deployment does fail, it's not always clear what caused the failure. We only give generic troubleshooting steps that may not even be relevant to the failure they are encountering.

Current Issues:

Current cluster bootstrap process is very noisy, with a lot of output that is not meaningful to the user and can be potentially confusing:

image

There are a few problems with this output:

  1. There is no formatting (or a mix of formatting)
  2. High level steps are not clear from low level steps being taken
  3. It's hard to tell what is relevant to pay attention to, versus what can be ignored
  4. Not clear what step is being done and how far along in the process it is
  5. When things go wrong, not clear what the cause of the failure is
  6. Not clear what the user needs to do to resolve the problem
  7. Not clear if there is cleanup that needs to be done before trying again

Issue Tracking

The following list tracks the issues required to resolve in order to achieve this capability. Please see the next section to understand the larger proposal.

Tanzu Framework

Community Edition

Other

Proposal

stmcginnis commented 2 years ago

cc @garrying

joshrosso commented 2 years ago

Thanks for bootstrapping this @stmcginnis. Looking forward to the design doc. Here are some ideas that come to mind, worth considering:

  1. 💯 to your common on capx-manager holding the key to failures. The logs from these would surely be too noisy to print at default verbosity, however, I think always writing those logs to a bootstrap log file and outputting to the user its available during bootstrap would be extremely high value.
    Creating management cluster in ${INFRASTURCTURE_PROVIDER}
        View bootstrap logs at: ${HOME}/.config/tanzu/tkg/bootstrap-logs/${CLUSTER_NAME}.log
    1. At a higher-verbosity, we just tail those logs to stdout as well.
  2. This proposal should break down the bootstrap visibility for management and workload clusters. I forsee these two looking quite different. In other words, i care about different things, like for my workload cluster, I want to understand what TKR was selected, CNI, etc -- not too dissimilar from our new standalone model.
garrying commented 2 years ago

This is great @stmcginnis! The high-level changes resonates with me. Happy to help start the design doc.

joshrosso commented 2 years ago

RFC open, initially targeting closure on 02/04/2022.

DennisFaucher commented 2 years ago

What about something similar to what Linux distros use for installation? A simple screen with one line the describes the current activity and the progress speedometer of that activity. The activity can be expanded to show detail if needed. Does not need to be a GUI, can be ascii/curses/whatever-based fedora_progress

stmcginnis commented 2 years ago

Great idea @DennisFaucher. That's a slightly different approach from what is being proposed here, but I could see that as a great follow on. If we make the updates proposed in the design doc, the UI could read those and update the output like the example you show. Then there could be a "Details" expander or something that would give the full output.

What do you think @miclettej ?

joshrosso commented 2 years ago

Great idea @DennisFaucher. That's a slightly different approach from what is being proposed here, but I could see that as a great follow on. If we make the updates proposed in the design doc, the UI could read those and update the output like the example you show. Then there could be a "Details" expander or something that would give the full output.

What do you think @miclettej ?

I agree that it's a great suggestion :tada: , but something we should consider for larger UI work and keep this proposal scoped to giving bootstrap log visibility.

For our future consideration, how do a progress UI like this differ from the kickstart UI?

image

miclettej commented 2 years ago

To Josh's point, we have something like this in the UI but may need to adjust the granularity of steps or filtering of logs. What we know from customers is that they like to know what stage of the deployment they are on, and what is remaining to complete. The step progress on the left was added in response to customer feedback. We can adjust appearance or granularity of steps/messaging to the customer. I think it may be important to continue to show some indication of steps that are completed and not yet completed.

DennisFaucher commented 1 year ago

So long team and thanks for all the fish. TCE was great.