Pagination issues in plugins

gfichtenholt commented 2 years ago

Capturing slack discussion in a new issue: @gfichtenholt : Guys, I think every single plugin we have might have a (theoretical) bug in the implementation of GetInstalledPackageSummaries(). 2 of the plugins (fluxv2 and kapp_controller) rely on dynamic.ResourceInterface.List(). I could be wrong but I don't see any guarantees anywhere that 2 successive invocation of .List() will return results in the same order. So we may be returning the same package multiple times with pagination enabled. helm plugin uses helm API action.NewList() func. Could be wrong, but again, same problem - don't see any guarantee that if you call this twice you won't get "item1, item2" followed by "item2, item1". helm does give you a "sorter" option when calling List(), so that might fix that issue. For the ones using dynamic.ResourceInterface looks like can be fixed using Limit/Continue fields in ListOptions(). Also I think that fluxv2 and kapp_controller have the same issue in GetAvailablePackageSummaries()

There is actually another problem also which all 3 plugins suffer from - something called "inconsistent snapshots" described here (look for the words "consistent snapshot)". To fix this properly, we need to make changes at all layers - the plug-in needs to pass some state back to UX and UX needs to pass state back to plug-in on next call. We could technically encode the state somehow in PageToken/NextPageToken since those are arbitrary strings or add more fields to our Request/Response messages. Anyway, this exercise turned out to be a bigger deal than I thought originally (I just wanted to fix a small TODO left over in my unit tests and it evolved into this), so I am putting this on a back-burner

@absoludity : Definitely worth capturing in an issue so we can prioritise it if and when it becomes a practical issue. I'm testing this with the flux plugin right now, and I see installed package summaries being paginated, in order of the package name, without issue. But from what I've understood above, if we don't encode the pagination token (from the k8s API server) and provide it on the next request (which we're currently not doing), we can potentially see the issues described (ie. if the datastore of the k8s api server changes between paginated requests). But if we do update to include that token, we should avoid seeing any inconsistencies I think (we just can't get a different sort order server-side)? Reading this thread seems to make it clear that the pagination on the api server is strictly to enable clients to get the whole set in chunks (rather than a single request), as opposed to a user scrolling and loading the next pages piecemeal. But for our practical purposes, it seems that we won't see issues currently in typical usage and can avoid those by updating to ensure we include the api servers pagination token?

absoludity commented 2 years ago

Although we would be able to send the snapshot state back and forth to avoid the inconsistent snapshots when not filtering, when filtering I don't think it is possible because, since the k8s API itself doesn't support the filtering we need, our plugins will filter the normal snapshot result from the k8s API server, so the next paginated request will not necessarily be requiring the next chunk in the snapshot from the api server, but rather the existing chunk (as there may be more results in the existing chunk after we filter). As an example, say

the UX requests a 20 items and sends a filter query of "server",
the only enabled plugin may then request 20 items from the k8s API, receiving the next snapshot token
the plugin then runs the items through the filter, which filters out 2 of the 20 items,
the plugin fetches the next 20 items from the k8s API using the snapshot token from 2, receiving the next snapshot token
the plugin then runs the items through the filter and adds the first two to the result set to return to the user

At this point, there may be more results in the set fetched at 4, but we would have no way of including those in the next paginated request when using the snapshot token.

This is the similar situation mentioned on the issue linked above regarding ordering and pagination. For us the ordering isn't an issue (since we don't provide any ordering functionality other than the default name order returned by the k8s API), but the filtering that we do has the same effect: we can't use the snapshot token and would need to request the whole set to avoid inconsistent snapshots.

As it is, I think we'd only hit the inconsistent snapshot issue if the data is updated in the datastore between paginated requests, which will happen at times (eg. paginated requests while kapp-controller is populating the PackageMetadata or similar for Helm while the sync job is running), but not sure if it will affect users enough to warrant raising the priority. I'll focus on #3399 for now and we can see if this related issue here becomes a serious issue for users. Let me know if I'm missing something.

absoludity commented 2 years ago

Closing based on the above. If we start experiencing issues related to this affecting users, I'm happy to attempt sending the continue token back from the plugin's calls to the k8s API through to the aggregated core API. I've verified that it is possible to send the k8s API server's continue token back multiple times, so we can theoretically store the token and an offset relative to the beginning of that chunk rather than relative to the result set, for example. But this is only relevant for the installed package summaries for flux, helm and carvel. For the available package summaries it'd only affect carvel (as flux and helm cache the data in a separate datastore), afaict.

vmware-tanzu / kubeapps

Pagination issues in plugins #4241