RFE: use learner mode for joining etcd members

fabriziopandini commented 5 years ago

Growing a local etcd cluster is a complex operation, and in the past, we already faced some issues like e.g. https://github.com/kubernetes-sigs/kind/issues/588

Now that the implementation of the etcd learner mode is progressing, we should start considering if to use it in kubeadm in order to make join --control-plane implementation more robust.

at a high level what we would like to achieve is:

a new etcd member should be created as a learner and became a voting member only after the etcd data are fully aligned. ideally
we should also prevent the api-server to read from a learner node

Ref docs:

(edit by neolit123)

1.26:

initial KEP draft https://hackmd.io/@DAKGcrh_RpC5vlt8w5bf8A/r1qoLh9zj
KEP PR + tracking issue at k/e: https://github.com/kubernetes/enhancements/pull/3615 https://github.com/kubernetes/enhancements/issues/3614

1.27(alpha):

k/k alpha feature gate code https://github.com/kubernetes/kubernetes/pull/113318 https://github.com/kubernetes/kubernetes/pull/115038
CI added https://github.com/kubernetes/kubeadm/pull/2807 https://github.com/kubernetes/test-infra/pull/28574

1.29(beta):

1.32(GA):

1.33:

TODO: remove the FG from kubeadm code
TODO: update the k/website "kubeadm init" page

SataQiu commented 5 years ago

/cc

rosti commented 5 years ago

We have to be careful, but we certainly need to act upon it. The plan is that from etcd 3.5 new members will be joined only as learners. In etcd 3.5 it will be possible to use a learner node for reading, but still the problem with writing continues. And, as LBs are out of the scope of kubeadm, things might become a bit difficult. We probably need to direct API servers to healthy leaders and possibly do that via an etcd LB. Another possibility is to not expose the API servers, that have a local learner etcd node from the API LB (not sure if this would actually work though). In short, we need to experiment a bit with this to find what's viable and easy for use.

RA489 commented 5 years ago

/assign

neolit123 commented 4 years ago

xref https://github.com/kubernetes/kubeadm/issues/2005#issuecomment-575931176

https://github.com/etcd-io/etcd/blob/master/Documentation/op-guide/runtime-configuration.md

prksu commented 4 years ago

/cc

ereslibre commented 4 years ago

Some context on this: https://github.com/etcd-io/etcd/pull/11640, we might want to wait for an etcd version that includes this patch.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 4 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

neolit123 commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

invidian commented 3 years ago

/remove-lifecycle stale

wangyysde commented 3 years ago

/cc

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pacoxu commented 2 years ago

Since Kubernetes 1.23, we are using etcd 3.5.0 now. https://etcd.io/docs/v3.5/learning/design-learner/ This seems to be a nice candidate for 1.24.

pacoxu commented 2 years ago

What is the suggested way to use the learner mode?

join member as a learner
wait for data aligning and then promote it to a member (kubeadm needs to wait for it to happen.)

And I see in the etcd future plan, https://github.com/etcd-io/etcd/pull/10887 a pull request that may add the ability to auto promote the learner. (Will we wait for the auto promote feature?)

neolit123 commented 2 years ago

I am not up to date on the learner mode support in etcd but auto promotion sounds better. The change in kubeadm will need a new KEP.

https://github.com/kubernetes/enhancements/tree/master/keps On Dec 14, 2021 12:06, "Paco Xu" @.***> wrote:

What is the suggested way to use the learner mode?

join member as a learner

wait for data aligning and then promote it to a member (kubeadm needs to wait for it to happen.)

And I see in the etcd future plan, etcd-io/etcd#10887 https://github.com/etcd-io/etcd/pull/10887 a pull request that may add the ability to auto promote the learner. (Will we wait for the auto promote feature?)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubeadm/issues/1793#issuecomment-993374842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACRATBNCQAJOUZ4Z3TBJYDUQ4JI7ANCNFSM4IXSYHIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RA489 commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

RA489 commented 2 years ago

/remove-lifecycle stale

neolit123 commented 2 years ago

we had a discussion about this today with @fabriziopandini so learner mode was used in Talos https://www.talos.dev/v0.12/introduction/what-is-new/#etcd and we might want to have it in kubeadm.

i can start writing a doc / KEP for this but if someone will like to work on the code / unit tests / e2e tests that would be great.

neolit123 commented 2 years ago

@fabriziopandini @pacoxu and others watching the ticket.

i've created a hack MD draft for the learner mode proposal: https://hackmd.io/@DAKGcrh_RpC5vlt8w5bf8A/r1qoLh9zj

comments are welcome. we can notify kubespray and Cluster API for feedback too.

neolit123 commented 2 years ago

once we have some agreement, i can PR the KEP in a raw / draft stage and keep it provisional. if we get a contributor who wants to work on this we can start tracking the feature for a kubeadm / k8s release milestone. (e.g.. this work is not planned for 1.26 unless we get a contributor to sign up)

sbueringer commented 2 years ago

Thx! I've added it to the ClusterAPI office hours agenda for today.

I'll also try to take a look, but could take a bit, just not enough bandwith right now.

neolit123 commented 2 years ago

updated link https://hackmd.io/@DAKGcrh_RpC5vlt8w5bf8A/r1qoLh9zj

neolit123 commented 2 years ago

i can PR the proposal around end of next week in k/enhancements

pacoxu commented 2 years ago

The KEP looks good. For the detail steps according to etcd docs.

add as a learner: member add --learner.
promote it once canPromote==true: call member promote API. There would be a new timeout here.

My only question is that:

Can we skip step 2 above so that we use an etcd learner as a standby node? I mean we don't promote it and leave it to users. Users can promote it in case the master has some problems. Or keep it as a learner/standby node for failover. Should we add a flag for it?

BTW, there is no new learner mode related bug in etcd repo.(I am not sure if there is no enough feature users in etcd community or the feature is very stable.)

neolit123 commented 2 years ago

Can we skip step 2 above so that we use an etcd learner as a standby node? I mean we don't promote it and leave it to users. Users can promote it in case the master has some problems. Or keep it as a learner/standby node for failover. Should we add a flag for it?

my vote would be to preserve the current behavior and not a have a flag - i.e. always try to promote.

BTW, there is no new learner mode related bug in etcd repo.(I am not sure if there is no enough feature users in etcd community or the feature is very stable.)

that may be a concern, but the Talos project is using it too and maybe it just works fine.

we can ask etcd maintainers later.

neolit123 commented 2 years ago

KEP PR is here: https://github.com/kubernetes/enhancements/pull/3615

k/e tracking issue: https://github.com/kubernetes/enhancements/issues/3614

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

invidian commented 1 year ago

/remove-lifecycle stale

pacoxu commented 1 year ago

The alpha-level code is merged for v1.27 in https://github.com/kubernetes/kubernetes/pull/113318

juliantaylor commented 1 year ago

How was the issue with the apiserver failing when requesting against a learner node addressed? https://github.com/kubernetes/kubeadm/issues/1793#issuecomment-532630048 https://github.com/etcd-io/etcd/issues/12789

As of etcd 3.5.6 and kubernetes 1.23.14 write requests will fail when it hits a learner node:

 etcdserver: rpc not supported for learner

Is this handled in newer k8s-apiserver versions?

neolit123 commented 1 year ago

Is this handled in newer k8s-apiserver versions?

@juliantaylor TMK, no. also this is the first time i see the issue. EDIT: NVM, i recall the k/kubeadm comment, but given some k8s distros exclusively use learners nowadays, this should no longer be an issue

^ @pacoxu @ahrtr

tobiasgiese commented 1 year ago

I think there is an chicken-egg issue with the implementation of https://github.com/kubernetes/kubernetes/pull/113318.

While adding a new member as learner the new member will be promoted directly. Also the promotion will wait to succeed. The problem here is that the static pod manifest for the etcd will be written only as soon as the promotion was successful. But there will never be a etcd container running w/o the manifest. See cmd/kubeadm/app/phases/etcd/local.go#L152-L166.

Or am I missing something here?

Edit:

Adding some logs of the failed promotion

I0112 15:44:34.662336    8944 etcd.go:123] update etcd endpoints: https://10.6.0.167:2379
I0112 15:44:34.662480    8944 local.go:151] [etcd] Adding etcd member: https://10.6.0.133:2380
I0112 15:44:34.668641    8944 etcd.go:394] [etcd] Adding etcd member as learner: 68747470733a2f2f31302e362e302e3133333a32333830
I0112 15:44:34.674268    8944 etcd.go:463] [etcd] Promoting a learner as a voting member: 5747ee42cd477756
{"level":"warn","ts":"2023-01-12T15:44:34.674Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0007e6c40/10.6.0.167:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
I0112 15:44:34.674985    8944 etcd.go:475] [etcd] Promoting the learner 5747ee42cd477756 failed: etcdserver: can only promote a learner member which is in sync with leader
I0112 15:44:34.799658    8944 etcd.go:463] [etcd] Promoting a learner as a voting member: 5747ee42cd477756
{"level":"warn","ts":"2023-01-12T15:44:34.800Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0007e7180/10.6.0.167:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
I0112 15:44:34.800447    8944 etcd.go:475] [etcd] Promoting the learner 5747ee42cd477756 failed: etcdserver: can only promote a learner member which is in sync with leader

tobiasgiese commented 1 year ago

I added a WIP PR with a tested implementation. With this PR the static pod manifests will be created before the member is added to the cluster.

ahrtr commented 1 year ago

@juliantaylor TMK, no. also this is the first time i see the issue.

Basically there is no any change on the logic in etcdserver in 3.5,

So the error message below should NOT be a new issue in etcd 3.5.x.

As of etcd 3.5.6 and kubernetes 1.23.14 write requests will fail when it hits a learner node:

 etcdserver: rpc not supported for learner

pacoxu commented 1 year ago

I only tested it in my clusters and maybe the etcd manifest is left before I rejoin. That is why I did not meet this bug.

TODO: we may add some e2e test case for this mode.

Logically, this would be a problem. Thanks for the fix and I will test and review it later.

juliantaylor commented 1 year ago

@juliantaylor TMK, no. also this is the first time i see the issue.

Basically there is no any change on the logic in etcdserver in 3.5,
* [interceptor.go#L53-L55](https://github.com/etcd-io/etcd/blob/715a0047faba060577841b13c87e9b6a1269eaa0/server/etcdserver/api/v3rpc/interceptor.go#L53-L55)

* [interceptor.go#L222-L224](https://github.com/etcd-io/etcd/blob/715a0047faba060577841b13c87e9b6a1269eaa0/server/etcdserver/api/v3rpc/interceptor.go#L222-L224)
So the error message below should NOT be a new issue in etcd 3.5.x.
As of etcd 3.5.6 and kubernetes 1.23.14 write requests will fail when it hits a learner node:

 etcdserver: rpc not supported for learner

yes it is not a new problem it exists since the learner mode exists. It does imply you need to reconfigure and restart the apiservers every time before an etcd instance is restarted (or after one is added), if that is handled by kubeadm there is no issue.

juliantaylor commented 1 year ago

ah restarting is not a problem as it does not put the node back into learning mode. So from kubeadm's perspective this everything should be ok if it adds the node to the apiserver after promotion.

ahrtr commented 1 year ago

FYI. https://github.com/etcd-io/etcd/issues/15107

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

chendave commented 1 year ago

/remove-lifecycle stale

Looks like this has been implemented, but should stay open until this is graduated.

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

pacoxu commented 1 year ago

Updated the issue description for the implementation history:

The CI https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-learner-mode-latest. This feature is alpha since 1.27. I wondered if we could promote it to beta in the v1.29 release cycle.

Is there anything further that we should do to promote it? I opened https://github.com/kubernetes/kubernetes/pull/120228.

chrischdi commented 1 year ago

Updated the issue description for the implementation history:

The CI https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-learner-mode-latest. This feature is alpha since 1.27. I wondered if we could promote it to beta in the v1.29 release cycle.

Is there anything further that we should do to promote it? I opened kubernetes/kubernetes#120228.

@tobiasgiese : maybe you could provide some feedback, I think you folks still use it right?

tobiasgiese commented 1 year ago

maybe you could provide some feedback, I think you folks still use it right?

We (Mercedes-Benz) are using it already, yes. Also we have backported it to v1.2[4-6] (since https://github.com/kubernetes/kubernetes/pull/115038) and it is working quite well. We have never had any problems with the learner mode and we have alot of nightly builds (about 50 periodic nightly builds and 40 Prow trigger builds/jobs).

neolit123 commented 1 year ago

great feedback. thank you @tobiasgiese

maybe you could provide some feedback, I think you folks still use it right?

We (Mercedes-Benz) are using it already, yes. Also we have backported it to v1.2[4-6] (since kubernetes/kubernetes#115038) and it is working quite well. We have never had any problems with the learner mode and we have alot of nightly builds (about 50 periodic nightly builds and 40 Prow trigger builds/jobs).

pacoxu commented 10 months ago

I suppose that we should graduate this feature later in v1.31+ and get more feedback before GA.

So no action item for v1.30.

kubernetes / kubeadm

RFE: use learner mode for joining etcd members #1793