Support for setting resourceVersion in kubectl get

lbernail commented 4 years ago

What would you like to be added:

Support for --resource-version=X for kubectl get (or maybe or simpler flag to just force a read from cache by setting RV=0, such as --cache ?)

Why is this needed: Today, kubectl get only supporting LISTing with RV="" which on large clusters can be slow for users and impact the control plane and etcd in particular.

I did a few tests on large cluster with curl and the difference can be very significant:

time curl  -H "Accept: application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json" -H "Authorization: Bearer X" 'https://cluster.dog/api/v1/pods?labelSelector=app=A'
real    0m3.658s

time curl -H "Accept: application/json;as=Table;v=v1beta1;g=meta.k8s.io, application/json" -H "Authorization: Bearer X" 'https://cluster.dog/api/v1/pods?labelSelector=app=A&resourceVersion=0'
real    0m0.079s

Of course this is an extreme example:

test on one of our large clusters (30k pods)
queries not scoped to a namespace
result of filtering is only a few pods but the query hitting etcd still needs to retrieve all pods before filtering on apiserver

I'd be more than happy to provide a PR but I'm not familiar with the codebase so any (even small) guidance would be appreciated. Here is what I think so far but it's very likely I am missing/misunderstanding more than a few things:

add resourceVersion to the Builder struct
add ResourceVersionParam to Builder()
add a new RetrieveByVersion in addition to RetrieveLatest to Info

cc @wojtek-t because we discussed this on slack earlier this week

wojtek-t commented 4 years ago

@kubernetes/sig-cli-feature-requests @soltysh - for thoughts [I don't have strong preference about how much details we want to expose, but at the very high-level I'm supportive for this, given we have the ability to set RV in the k8s API.]

soltysh commented 4 years ago

@lbernail your example assumes you are getting all 30k pods at once, which kubectl get does not, by default it returns elements in chunks of 500 see https://github.com/kubernetes/kubernetes/blob/17312ea4a92a0bba31272a6709b37a88aa383b2d/staging/src/k8s.io/kubectl/pkg/cmd/get/get.go#L473, the flag being:

--chunk-size=500: Return large lists in chunks rather than all at once. Pass 0 to disable. This flag is beta and
may change in the future.

I don't have any strong opinions one way or the other, similarly to Wojtek :wink: but I could see such an addition which might be useful with --watch capability which currently hard-codes 0, see https://github.com/kubernetes/kubernetes/blob/17312ea4a92a0bba31272a6709b37a88aa383b2d/staging/src/k8s.io/kubectl/pkg/cmd/get/get.go#L664 you'd need to add it here: https://github.com/kubernetes/kubernetes/blob/17312ea4a92a0bba31272a6709b37a88aa383b2d/staging/src/k8s.io/kubectl/pkg/cmd/get/get.go#L441 similarly to how we add full object when sorting is requested.

lbernail commented 4 years ago

Thanks a lot @soltysh Even if the calls are paginated, they all make it to etcd right? So the total query time can probably only be higher? (and the impact on etcd greater?) Can apiserver server RV=0 paginated queries from the cache? (I think I read somewhere that paginated calls bypass the apiserver cache).

I'm discovering the includeObject option. How is it related to getting data from the apiserver cache? (sorry for the probably very basic question)

wojtek-t commented 4 years ago

but I could see such an addition which might be useful with --watch capability which currently hard-codes 0,

I like this argument.

Can apiserver server RV=0 paginated queries from the cache?

watcache doesn't support pagination - so the limit param is ignored with RV=0

soltysh commented 3 years ago

I'm discovering the includeObject option. How is it related to getting data from the apiserver cache? (sorry for the probably very basic question)

By default kubectl get retrieves server-side printed information about a resource. It's a table with pre-defined columns that is then printed to stdout. When you request -oyaml or anything that requires more data that table contains full definition of the object.

eddiezane commented 3 years ago

/triage accepted /priority backlog /help /good-first-issue

@lbernail were you still interested in working on this?

rudeigerc commented 3 years ago

Hi, may I have a try?

lauchokyip commented 3 years ago

@rudeigerc , yea go ahead, type /assign so k8s-ci-robot will assign you

rudeigerc commented 3 years ago

@lauchokyip Thanks a lot! 😃

/assign

rudeigerc commented 3 years ago

I think that I could follow https://github.com/kubernetes/kubectl/issues/965#issuecomment-718070378 to make some updates.

My question is should --resource-version only be imported when --watch is active, or should it be imported to the whole kubectl get? (Since when the resource version is specified, it is handled in func (o *GetOptions) watch(f cmdutil.Factory, cmd *cobra.Command, args []string) error separately as mentioned above.)

logicalhan commented 3 years ago

My question is should --resource-version only be imported when --watch is active, or should it be imported to the whole kubectl get?

It should be exposed to kubectl get in general. It basically enables stale reads at exact timestamps, we'd likely want this for both list and individual object get operations.

rudeigerc commented 3 years ago

@logicalhan Got it. Thanks. 😃

howardshaw commented 3 years ago

is there any update?

lauchokyip commented 3 years ago

@howardshaw I guess you can go ahead and give it a try

rudeigerc commented 3 years ago

@howardshaw Sorry I ran out of time recently, please go ahead if you would like to.

lauchokyip commented 3 years ago

I guess I will try tackling this one

lauchokyip commented 3 years ago

/assign

lauchokyip commented 3 years ago

I tried adding req.Param("resourceVersion", "x") here (https://github.com/kubernetes/kubernetes/blob/17312ea4a92a0bba31272a6709b37a88aa383b2d/staging/src/k8s.io/kubectl/pkg/cmd/get/get.go#L441) where x is any number > 0 to test. However, when I run kubectl get pods it always output The resourceVersion for the provided list is too old. If I do curl http://localhost:8080/api/v1/pods?resourceVersion="x" it was able to output the PodList. Is this behaviour expected? I am not sure how changing the req.Param is different with using the curl with resourceVersion as the query

howardshaw commented 3 years ago

ok i will try

Chok Yip Lau notifications@github.com 于 2021年3月5日周五 22:24写道：

/assign

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubectl/issues/965#issuecomment-791450272, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHOVDA4FHZGTHQZFIK4MXTTCDSQBANCNFSM4S4QCLCQ .

lauchokyip commented 3 years ago

@howardshaw , I am working on this one now but I will let you know if I can't solve it. Thank You.

lauchokyip commented 3 years ago

I did some digging, the errors seem like it's coming from etcd (https://github.com/kubernetes/autoscaler/blob/57be08dbdcd4cd92e43f89310f2114e365d02624/cluster-autoscaler/vendor/k8s.io/apiserver/pkg/storage/etcd3/errors.go#L28). This is where v3rpc.ErrCompacted (https://github.com/kubernetes/autoscaler/blob/57be08dbdcd4cd92e43f89310f2114e365d02624/cluster-autoscaler/vendor/go.etcd.io/etcd/clientv3/watch.go#L119) will be returned

I am assuming when the watch channel is run (https://github.com/kubernetes/autoscaler/blob/57be08dbdcd4cd92e43f89310f2114e365d02624/cluster-autoscaler/vendor/k8s.io/apiserver/pkg/storage/etcd3/watcher.go#L154) , this line (https://github.com/kubernetes/autoscaler/blob/57be08dbdcd4cd92e43f89310f2114e365d02624/cluster-autoscaler/vendor/k8s.io/apiserver/pkg/storage/etcd3/watcher.go#L242) is returning the error

logicalhan commented 3 years ago

You have you use a recent RV. You can't start at 1, it'll almost never work. What I would do is I would make a list call against any arbitrary resource. Doing so returns a ListRV. This is equivalent to etcd's current internal global counter. You can then use that RV to make subsequent requests. That RV will work for X minutes where is X is equal to the compaction interval set on the kube-apiserver. Every X minutes, the apiserver will make a call to compact etcd's database which will truncate old RVs. Eventually every RV will be compacted, so no RV is useable permanently (unless you disable compaction but then you will run into other issues).

lauchokyip commented 3 years ago

Will start working on it soon. Reading the Kubernetes API docs now :)

eddiezane commented 3 years ago

I synced up with @lauchokyip and have a few thoughts/questions.

Given this ResourceVersion Parameter table:

Some sort of --cached flag (better name?) makes sense to me for kubectl get.
Is there any real world use case/value of providing a resource version other than 0?
- If yes what is the user experience for acquiring an active resource version?
- Outside of watch (i.e. list)? Which match rules?

/remove-help /remove-good-first-issue

logicalhan commented 3 years ago

Some sort of --cached flag (better name?) makes sense to me for kubectl get.

Personally I would find that pretty confusing. It is possible to issue an exact read at an exact RV and get back a strongly consistent read, if nothing has changed. Specifying a specific resourceVersion is tantamount to saying I want this object at that revision, irregardless of what has or hasn't happened in the storage layer.

eddiezane commented 3 years ago

I don't really have an opinion on whether you'd want to pass --resourceVersion=0 or --cached. I'm more interested in the usability of RV's > 0.

Eddie Zaneski

On Thu, Mar 18, 2021, 7:57 PM Han Kang @.***> wrote:

Some sort of --cached flag (better name?) makes sense to me for kubectl get.

Personally I would find that pretty confusing. It is possible to issue an exact read at an exact RV and get back a strongly consistent read, if nothing has changed. Specifying a specific resourceVersion is tantamount to saying I want this object at that revision, irregardless of what has or hasn't happened in the storage layer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubectl/issues/965#issuecomment-802458566, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHKIVPJY554O5OR3E73WM3TEKVRVANCNFSM4S4QCLCQ .

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 3 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubectl/issues/965#issuecomment-899225939): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

logicalhan commented 3 years ago

/remove-lifecycle rotten

logicalhan commented 3 years ago

/reopen

k8s-ci-robot commented 3 years ago

@logicalhan: Reopened this issue.

In response to [this](https://github.com/kubernetes/kubectl/issues/965#issuecomment-899550582): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

logicalhan commented 3 years ago

Personally, I would like this feature.

lauchokyip commented 3 years ago

@logicalhan , would it make sense to just implement on kubectl get --watch based on this comment (https://github.com/kubernetes/kubectl/issues/965#issuecomment-718070378)??

I discussed with @eddiezane , we actually have some questions which is stopping us to move forward.

1) For normal user, there is no obvious way to get RV from kubectl so what value would the user has to put to make it useful if the user wants to use kubectl get --watch <RV>? 2) If we implement the RV outside of kubectl get --watch, we would need an exact RV based on the RV Parameter Table because kubectl will by default append the chunk size/limit 500 to the query string.

lau@debian:~ $ kubectl get pods -v=6
I0816 23:11:24.760352   19097 loader.go:372] Config loaded from file:  /home/lau/.kube/config
I0816 23:11:25.799065   19097 round_trippers.go:454] GET https://127.0.0.1:6443/api/v1/namespaces/default/pods?limit=500 200 OK in 998 milliseconds
NAME                                           READY   STATUS    RESTARTS   AGE
personal-website-deployment-798f5b7c8c-lkkgv   1/1     Running   0          46d
personal-website-deployment-798f5b7c8c-bfb95   1/1     Running   0          46d
personal-website-deployment-798f5b7c8c-bsrcx   1/1     Running   0          46d

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubectl/issues/965#issuecomment-1012783091): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

heipepper commented 2 years ago

/reopen

k8s-ci-robot commented 2 years ago

@heipepper: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes/kubectl/issues/965#issuecomment-1090144315): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

Sakuralbj commented 2 years ago

Is there any update? Can you reopen this issue? @logicalhan

lauchokyip commented 2 years ago

We decided there is no helpful use case for this issue so it remains closed for this reason On Aug 23, 2022, 09:12 -0400, Jianbo Ma @.***>, wrote:

Is there any update? Can you reopen this issue? @logicalhan — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Sakuralbj commented 2 years ago

We decided there is no helpful use case for this issue so it remains closed for this reason … On Aug 23, 2022, 09:12 -0400, Jianbo Ma @.>, wrote: Is there any update? Can you reopen this issue? @logicalhan — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.>

I think this function will be helpful when use kubectl get resources on large clusters. At this time, the list action will bring huge load to etcd and apiserver, etcd will return all data evenif we use labelSelecor or fieldSelector. Maybe we just need a simple flag to make the action read from cache by setting RV=0.

logicalhan commented 2 years ago

/reopen /remove-lifecycle rotten

k8s-ci-robot commented 2 years ago

@logicalhan: Reopened this issue.

In response to [this](https://github.com/kubernetes/kubectl/issues/965#issuecomment-1224212809): >/reopen >/remove-lifecycle rotten Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

logicalhan commented 2 years ago

This addition would be helpful if you are watching a very small set of known pods which you can GET by RV and then immediately start watching from that RV, since you don't have to do the entire list.

Sakuralbj commented 2 years ago

This addition would be helpful if you are watching a very small set of known pods which you can GET by RV and then immediately start watching from that RV, since you don't have to do the entire list.

I'll try it. Since I'm not familiar with this code base, it may take some time

lauchokyip commented 2 years ago

I would suggest bringing up this issue during sig cli meeting or you would end up modifying and not getting accepted for your PR. Thanks On Aug 23, 2022, 11:29 -0400, Jianbo Ma @.***>, wrote:

This addition would be helpful if you are watching a very small set of known pods which you can GET by RV and then immediately start watching from that RV, since you don't have to do the entire list. I'll try it. Since I'm not familiar with this code base, it may take some time — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kubernetes / kubectl

Support for setting resourceVersion in kubectl get #965