catherineluse commented 2 years ago

I'm opening this issue to fulfill an action item from a sprint retrospective. The action item came up because I had difficulty when working on these bug fixes: https://github.com/rancher/dashboard/pull/6570

I thought refactoring the store (focusing mostly on this file https://github.com/rancher/dashboard/blob/master/shell/store/index.js) could potentially improve the maintainability of the Rancher code base, along with adding end-to-end tests for namespace filtering. The E2E tests are being tracked in this separate issue: https://github.com/rancher/dashboard/issues/6613 It also has the potential to improve performance if the loadManagement and loadCluster actions are refactored to fetch and watch fewer resources.

Background

I found the code here to be brittle because I introduced unwanted side effects when changing the behavior of what happens when users filter cluster explorer by namespace.

This is the file that I think should be refactored:

https://github.com/rancher/dashboard/blob/master/shell/store/index.js

Here is a summary of what happens in that file:

All namespaces in the cluster are loaded.
All the active namespaces are loaded. These are the namespaces that the user can select from the top nav of cluster explorer to filter their view.
When the user changes their selected namespaces in the top nav of cluster explorer, the code to update that selection and refetch resources in cluster explorer is located here.
The state in this file keeps track of whether the current cluster is in a ready state.
The state here also keeps track of the context of whether the user is in cluster explorer, Harvester or Epinio in the currentStore, currentProduct and isExplorer getters.
The loadManagement action is defined here. This is a huge action that does a large amount of setup when the user refreshes the page, including loadAllSchemas. The loading of schemas contributes to the fact that it takes a long time for Rancher pages to load initially and is the reason why I tagged this issue with area/performance.
The loadCluster action is also defined here. This is also a huge action with many lines of code which does a number of async API calls to load the cluster explorer view.

Objective

I don't have a clear picture in my head of what changes I think should happen. But here's a summary of what I was thinking:

1. Refactor `getActiveNamespaces` to separate mutation from getter

Although not technically a getter, the name of this is misleading because it calls updateActiveNamespaceCache, which mutates state. It is unexpected when a getter mutates state, and makes it easy to make changes that have unintended side effects. So we should separate the mutation of setting the active namespaces from the getter of retrieving them. I tried to make this code cleaner, and made an effort to separate the getter from the mutation. But I did not figure out how to do that without running into timing problems caused by extricating the updateNamespaces logic from loadCluster.

Here is what's in loadCluster:

commit('updateNamespaces', {
      filters: filters || [ALL_USER],
      all:     res.namespaces,
      ...getters
    });

As I remember from working on this 2 months ago, I spent a long time trying to figure out how to call that from other locations, but it seems to need to be run in a specific order compared to other getters and mutations in loadCluster. When I tried changing, I kept getting errors that said the schemas were not yet loaded.

2. Get navigation info from URL/params instead of the store

There are some items in state here that made me wonder why we don't determine them based on params. Why are isRancher, isExplorer, isSingleProduct, isVirtualCluster, clusterId, productId, currentProduct, currentStore and workspace tracked here? These seem to all track where you are in the context of navigating through the app and whether you are looking at Rancher, Epinio or Harvester.

Wouldn't it be simpler and less error-prone to just always take those values from the URL/params?

Shouldn't most of the logic for loading cluster resources be on a Vue component such as https://github.com/rancher/dashboard/blob/master/shell/layouts/default.vue so that all the params would be readily available to be used when loading the resources?

3. Take utility functions out of the store

The store should only be used for storing application state.

Why is backToRancherGlobalLink in the store? That is not a piece of state that would change during a user's session - it's just a link that stays the same while the user is logged in. Same goes for backToRancherLink, which shouldn't need to depend on state, just URL/params.
Why are things like urlFor and normalizeType in the store? Functions like these are just reusable bits of code that don't depend on application state.

4. Refactor `loadManagement`

The loadManagement action loads data needed for cluster manager.

In loadManagement we have a comment that says // Maybe not Rancher after attempting to fetch all Norman principals, in case that errors out. This is backwards. First we should check if this is Rancher, THEN attempt to fetch all Norman principals.
In the same action we also have conditional logic like if ( isRancher ) { in the loadManagement action. This is disorganized because we also load Rancher schemas, namespaces and Norman principals outside of that conditional statement. All the Rancher specific setup should go together, inside the one isRancher conditional statement. When I look through this long function, I have a hard time determining which code is actually Rancher specific, which is Harvester specific, which is Epinio specific, and what applies to everything in all cases. This should be clear. We should just have separate helper functions with names like rancherSpecificSetup or pluginSpecificSetup to make it easier to understand what logic pertains to which product. So we could have something like:

if (isRancher){
  rancherSpecificSetup()
}
if (isPlugin){ // or Harvester or Epinio
  pluginSpecificSetup()
}
// etc...

5. Changes to `loadCluster`

This action loads several types of resources, including all namespaces in the cluster, regardless of what namespace the user has selected in the top nav. This action is so resource intensive that I find myself coming back to it again and again as a place where performance optimizations could be done.

Note the repeated use of await below. As Sean has pointed out in the past, it is not ideal to use await in combination with dispatch. It would be better to make the API call without waiting for the response, and without blocking other parts of the UI while that is loading. Any part of the UI that depends on a getter in Vuex will just take the default value, then will be automatically updated when the value from the getter changes.

In this action, why do we do management/watch for projects, but then immediately do management/forgetType for projects? Why watch it if we're just going to immediately forget it?
Instead of having a comment saying // Execute Rancher cluster specific code, we should have a separate helper function that does all of the Rancher specific cluster setup, and call that. Because currently there is Rancher cluster specific code both above AND below that comment. For example, projects are Rancher cluster specific and they are loaded above that comment.
Do we really need dispatch('cluster/subscribe');? Shouldn't we avoid watching all resources in the cluster (which is especially a concern for the local Rancher server cluster, which has to bear the traffic of all Rancher users), and instead only load and watch the resources that the user has filtered to see on the screen, on a page-by-page basis? Note: This is partly covered by other performance related GitHub issues
About cluster/loadSchemas - why can't we load schemas on a component-by-component basis? Throughout the code base we check for individual schemas in getters, but I don't see why we shouldn't check for individual schemas asynchronously, to avoid having to put all of them in the store.
We should refactor the code so that we don't need this workaround:

// This is a workaround for a timing issue where the mgmt cluster schema may not be available
// Try and wait until the schema exists before proceeding
await dispatch('management/waitForSchema', { type: MANAGEMENT.CLUSTER });

Instead we should do an async call for the one specific management cluster schema, await that, then if that individual schema shows up, load the management cluster resources. It shouldn't have to wait for all schemas (many megabytes of data) to be loaded before that can execute.

This ties into my other tech debt issue about improving performance related to the side nav of cluster explorer. https://github.com/rancher/dashboard/issues/6485 My thought is that if we could load individual schemas that are needed on a component-by-component basis, the page could load faster due to not having to wait for all schemas to be loaded before doing anything. This added flexibility would also enable more features to be added to cluster explorer (the same features mentioned in the tech debt issue about refactoring type-map.js https://github.com/rancher/dashboard/issues/5607) such as the ability to allow a user-configurable side nav, or the ability to conditionally render an item in the side nav based on whether a service is available.

richard-cox commented 2 years ago

I don't think we should tackle anything major here until we have a set of e2e tests covering basic loading and navigation flow between clusters and plugins.

Some very quick, high level comments

Agree, there's a lot of improvements we can make to the namespace functionality
We could move product/cluster info out, however would need to apply it to all root layouts (we'd need to know if they move out from a product/cluster context so can't just use the default layout), however to reduce code duplicate they would need to go somewhere common... like the root store where they are now.
- It looks like backToRancherGlobalLink isn't actually used anywhere, it's referenced in the header but not called.
- urlFor and normalizeType are part of the dashboard store and are overridden by plugins when they extend it
- // Maybe not Rancher aft is very weird indeed. This might tie in to isRancher and i've always wondered what use case it covers
- There shouldn't be anything plugin (harvester, epinio, etc) specific left in loadManagement. The isVirtualCluster is legacy and almost all use cases have been removed now except for one in NamespaceFilter (which should be addressed)
- Plugins have kind of two modes - integrated (normal rancher manager ui with extra plugin product) and standalone (only the plugin ui will be shown). The distinction is now generally covered by isSingleProduct and the object it returns
- management/watch & management/forgetType. That's probably a bug, anything management specific is global. It maybe should be forgetting projects in the old cluster and watching them in the new one. Projects can be used within the rancher context and plugin context (for instance harvester)
- dispatch('cluster/subscribe') only opens the websockets. Resources are only watched on fetch
- cluster/loadSchemas defines what resources are shown in the side nav. the user can't get to the specific resource page without them first showing there first
~rehydration is only done when SSR is enabled, something that we've since disabled. The code should probably be removed~ UPDATE: Sorry, from your links it looks like the web worker pagination change will use hydrate once it crosses the worker boundary

cnotv commented 2 years ago

I think unit test should suffice for something so aimed, as we do not need the use of the server. It's actually a bad practice to test UI logic in a E2E, as it tends to involve other parties which may not be interested in our state management architecture.

If both server and client would have to put in E2E tests which are aimed to local logic, the time required would be unbearable.

We should have enough unit tests to make our application reliable beside what the server return to us.

richard-cox commented 2 years ago

Granular unit tests will not cover the load and navigation features required to ensure there are no regressions. There's a lot going on in this store with lots of moving parts. We should have e2e tests to cover

For each concept

No cluster context
cluster context
plugin no cluster context
plugin cluster context

We need to cover

Logging in and going to a page for that each context
Session verified and going to a page for each context
Navigating between pages of different contexts
Refreshing

As well as the namespace side of things (mostly listed in the other issue)

Setting the namespace filter, leaving the cluster and returning

Writing unit tests to mock the correct input, store handlers, output, etc for all of the above would be real time consuming.

There is space for testing the inner workings, but from a high level we need e2e tests.

gaktive commented 2 years ago

Point 6 appears to be addressed elsewhere already so this ticket would need some updating based on recent activity. Pushing into 2.7.x until we get more E2E in.

catherineluse commented 2 years ago

Removed point 6

nwmac commented 1 year ago

Won't fix - work folded into other issues

rancher / dashboard

Refactor root store functions #6882

Background

Objective

1. Refactor `getActiveNamespaces` to separate mutation from getter

2. Get navigation info from URL/params instead of the store

3. Take utility functions out of the store

4. Refactor `loadManagement`

5. Changes to `loadCluster`

rancher / dashboard

Refactor root store functions #6882

Background

Objective

1. Refactor getActiveNamespaces to separate mutation from getter

2. Get navigation info from URL/params instead of the store

3. Take utility functions out of the store

4. Refactor loadManagement

5. Changes to loadCluster

1. Refactor `getActiveNamespaces` to separate mutation from getter

4. Refactor `loadManagement`

5. Changes to `loadCluster`