vmware-tanzu / kubeapps

A web-based UI for deploying and managing applications in Kubernetes clusters
Other
5k stars 707 forks source link

Reduce response times when using large repos in Carvel plugin #4099

Closed antgamdia closed 2 years ago

antgamdia commented 2 years ago

The current implementation is unusable (40-50 seconds) when it comes to large repositories. Since Carvel stores a Package for each version, we have #packages * #versions different packages.

For retrieving the package details, we need to query this object, so, for each package, we need to perform this O(n*m) search. Current implementation's response times are up to 10 seconds...

An initial general improvement should be tried and, if not improving, we might have to consider adding a cache...

More info at: https://github.com/kubeapps/kubeapps/pull/3784

Increasing the QPS (eg 100), does not seem to be working (still resp. times up to: 40-50 seconds)

antgamdia commented 2 years ago

I'd bet we're gonna need a similar approach as the one implemented at https://github.com/kubeapps/kubeapps/issues/3636 :sweat_smile:

absoludity commented 2 years ago

I took a look at the issue and possible solutions to this problem today, specifically at the GetAvailablePackageSummaries call.

I added the TCE repo (12 package metadatas with 17 package versions). Opening the catalog in Kubeapps three times resulted in:

I0303 05:14:17.418209       1 server.go:59] OK 5.691394379s /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
...
I0303 05:14:40.599514       1 server.go:59] OK 5.765965114s /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
...
I0303 05:14:48.814321       1 server.go:59] OK 5.128993047s /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries

Similar to your thoughts above @antgamdia , it looks like the main issue is that currently we're doing a separate request for every package metadata to get the related package (version) resource. The data which is fetched with this extra request per item is simply the latest version.

Three options I can think of (ordered by worthiness :P ) are:

Option 1 - fetch all package (versions) with a single request

Using a single request for all package (version) resources and then filtering this for the data we need while iterating the results (rather than converting it to an in-memory slice) would be much more efficient than N queries in most situations.

It'd be nice if we could just select all relevant package (version) resources, but the k8s API server only allows a limited set of operators for field selectors, so it's only possible to fetch either all or for one specific metadata name (the latter being what the code does currently)

Option 2 - Don't display the version in the package summaries for carvel

I thought it wouldn't look too bad if the summaries view (for Carvel) just didn't display a version (ie. we send it back blank). This would solve the issue for the catalog view, but would look odd when mixing results from different plugins and would not help the similar situation in other carvel end-points (get installed package summaries).

Option 3 - Update the UX to get the version data separately

Plugin returns the summary data (basically, the carvel package metadata), then the dashboard requests version data for each package displayed to update dynamically.

Obviously a large change. We'd want to think through carefully whether this fit other plugins needs too (as we don't do it currently obviously, but perhaps that's because of expensive options we've chosen).

Result

I went ahead with the Option 1 since it was the most viable and most useful right now. Initial results (for the TCE repo) seem good, taking the average for over 5s for the call as above down to between 260-500ms:

I0303 05:36:29.301656       1 server.go:59] OK 280.707563ms /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
...
I0303 05:36:52.809490       1 server.go:59] OK 263.191122ms /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
...
I0303 05:37:27.609722       1 server.go:59] OK 490.541777ms /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries

I'll push and link a PR with that where we can discuss the details if we want to land it (and we can update the GetInstalledPackageSummaries similarly).

absoludity commented 2 years ago

Tested with both the TCE repository and @vrabbi 's bitnami repo at https://github.com/users/vrabbi/packages/container/package/bitnami-charts-repo so that I've got, in total, 112 packages on my local system:

k get packages | wc -l                                               
113

The change in the related PR brings the response time down from 40-50s to 4-5 seconds:

I0304 01:39:19.978673       1 server.go:59] OK 4.140540445s /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
...
I0304 01:41:18.568173       1 server.go:59] OK 3.899983355s /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
...
I0304 01:41:56.281673       1 server.go:59] OK 4.14603719s /kubeappsapis.core.packages.v1alpha1.PackagesService/GetAvailablePackageSummaries
antgamdia commented 2 years ago

Wow, awesome Michael. So relieved we've been able to reduce the response times... I mean, it was unbearable; perhaps the cache would speed it up to ms, but at least now is usable. Thanks!

vrabbi commented 2 years ago

This sounds awesome! Cant wait to try out the plugin with shorter loading times! Other then the timing i am loving the carvel support and it will be only better with this change making it a truly viable solution at scale!