jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

Mark the first pod in each failure zone as a seed node #252

Closed kragniz closed 6 years ago

kragniz commented 6 years ago

This labels a single cassandra node per failure zone as a seed node, allowing a seed provider to pick up this information (via directly querying the endpoints, or via srv records) and pass it to kubernetes.

This is currently based off the node failure-domain.beta.kubernetes.io/zone label, but could be made configurable in the future (probably the same as #247)

/area cassandra

Designate one node per failure zone as a seed
jetstack-bot commented 6 years ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: We suggest the following additional approver: wallrj

Assign the PR to them by writing /assign @wallrj in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files: - **[OWNERS](https://github.com/jetstack/navigator/blob/master/OWNERS)** You can indicate your approval by writing `/approve` in a comment You can cancel your approval by writing `/approve cancel` in a comment
munnerz commented 6 years ago

I'm not sure this is the right way to decide on seed nodes. I've a few questions, and a few concerns:

1) Is it likely a user would ever want to change a seed node, or perhaps would we ever want to? (e.g. during a maintenance event or upgrade) 2) How many seeds are usually run in a cluster? Is one per zone usual? 3) What if a user has a single-zone cluster, but wants more than one seed? (from what I understand, a minimum of one seed is required in order to interact with the cluster?) 4) How dynamic is changing seed nodes? Is it an expensive operation?

I'm concerned we are suddenly jumping to a new grouping mechanism that currently, no part of Navigator acknowledges (zone). We do need to introduce a grouping mechanism, but I rather hoped it could be based on labels, of which could be arbitrarily defined by the user.

These failure-domain annotations are quite standard in cloud environments, but not so much in clusters without them. I have a feeling our e2e testing environment (minikube) for example, does not set these parameters. We can't restrict users to requiring certain labels are set on the nodes that pods run on.

In terms of moving this PR forward, I think changing it to simply make the first pilot in a node pool be a seed would suffice. It only addresses point (2) & (3) above, but it does remove that binding to zones/fixed labels.

We do need to work out how this is properly expressed though, but I think it's a more complex problem than this (we need to allow it to be highly configurable by the user, but still validated by Navigator to ensure the desired configuration is valid).

kragniz commented 6 years ago

Is it likely a user would ever want to change a seed node, or perhaps would we ever want to? (e.g. during a maintenance event or upgrade)

I can't think of a good reason to manually change the seed nodes.

How many seeds are usually run in a cluster? Is one per zone usual?

A small number per zone, 1-3. Netflix runs 1 per AWS zone, others recommend 2 or 3.

What if a user has a single-zone cluster, but wants more than one seed? (from what I understand, a minimum of one seed is required in order to interact with the cluster?)

We could add an option for number of seeds per zone? A minimum of one live seed is required to add new nodes to the cluster.

How dynamic is changing seed nodes? Is it an expensive operation?

Very cheap, any node can become a seed node at any point. Gossip traffic will simply be directed at other nodes.

I like the idea of navigator working on cloud providers correctly by default, so I'm keen to implement something that doesn't require a lot of thought from an end user. How would you feel if the node labels were configurable, with the current labels as defaults?

kragniz commented 6 years ago

I made an alternative pr for selecting the first pod in a nodepool as a seed here: #254

jetstack-ci-bot commented 6 years ago

@kragniz PR needs rebase

jetstack-bot commented 6 years ago

@kragniz: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
navigator-quick-verify 45cb65b82f55f366d9065e2e2f57431b15977449 link /test verify

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/devel/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
kragniz commented 6 years ago

Closing in favour of #254