jetstack / navigator

Managed Database-as-a-Service (DBaaS) on Kubernetes
Apache License 2.0
271 stars 31 forks source link

WIP: Use hostnames rather than IP addresses for cassandra nodes #330

Closed wallrj closed 6 years ago

wallrj commented 6 years ago

Fixes: #319

Release note:

NONE
jetstack-bot commented 6 years ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To fully approve this pull request, please assign additional approvers. We suggest the following additional approver: munnerz

Assign the PR to them by writing /assign @munnerz in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/jetstack/navigator/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
wallrj commented 6 years ago

Ok. This looks promising:

INFO  [main] 2018-04-12 12:59:15,418 MessagingService.java:753 - Starting Messaging Service on cass-test-np-region-1-zone-a-0/10.192.2.5:7000 (eth0)  
WARN  [main] 2018-04-12 12:59:15,423 SystemKeyspace.java:1089 - No host ID found, created 26dc77f9-9057-4151-a391-c8192f412fe4 (Note: This should happ
en exactly once per node).         
wallrj commented 6 years ago

Using fully qualified domain names for nodes does work, to a point. But if a node IP address changes, the new IP address doesn't get gossipped around the cluster, as I'd hoped.

cass-test-np-region-1-zone-a-2   0/1       Terminating   0          8m
cass-test-np-region-1-zone-a-2   0/1       Init:0/1   0          0s
cass-test-np-region-1-zone-a-0: 10.192.2.44
cass-test-np-region-1-zone-a-1: 10.192.3.14
cass-test-np-region-1-zone-a-2: 10.192.2.46
cass-test-np-region-1-zone-a-3: 10.192.2.45
cass-test-np-region-1-zone-a-4: 10.192.3.16
INFO  [main] 2018-04-12 20:43:50,310 StorageService.java:1442 - JOINING: Starting to bootstrap...
Exception (java.lang.RuntimeException) encountered during startup: A node required to move the data consistently is down (/10.192.3.15). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false
java.lang.RuntimeException: A node required to move the data consistently is down (/10.192.3.15). If you wish to move the data from a potentially inco
nsistent replica, restart the node with -Dcassandra.consistent.rangemovement=false
        at org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:294)
        at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:177)
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1491)
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:966)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681)
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393)
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689)
richard@pet-instance-1:~/go/src/github.com/jetstack/navigator$ kubectl -n test-cassandra-1523564296-10880 exec cass-test-np-region-1-zone-a-0 -- /bin/sh -c 'JVM_OPTS="" nodetool status'
Datacenter: region-1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.192.3.16  98.17 KiB  256          41.0%             9982acb2-84c1-48b9-ad9b-192559a71525  zone-a
UN  10.192.2.44  108.59 KiB  256          38.2%             9e5fc55f-15fc-4083-955b-3e5257cbc806  zone-a
UN  10.192.2.45  89.5 KiB   256          40.4%             be98aa64-3777-4018-bee0-7fc26fd9109b  zone-a
UJ  10.192.2.46  115.16 KiB  256          ?                 1bcd0ce9-9c2e-4cee-8b07-179c2ea63323  zone-a
UN  10.192.3.14  108.36 KiB  256          41.8%             0069445d-ad81-4513-a040-1982cdd6a279  zone-a
DN  10.192.3.15  127.33 KiB  256          38.6%             7e0dc352-eddc-416c-a773-df5cf940f089  zone-a
jetstack-bot commented 6 years ago

@wallrj: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
navigator-e2e-v1-8 246f25c483b446d9a004f16fa5d0264794c60957 link /test e2e v1.8
navigator-e2e-v1-9 246f25c483b446d9a004f16fa5d0264794c60957 link /test e2e v1.9
navigator-e2e-v1-7 246f25c483b446d9a004f16fa5d0264794c60957 link /test e2e v1.7

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/devel/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
munnerz commented 6 years ago

Can this be closed in favour of #334?

jetstack-bot commented 6 years ago

@wallrj: PR needs rebase.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/devel/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
wallrj commented 6 years ago

This branch has some useful code for the linked issues above, but closing the PR in favour of #334 which implements the minimum changes.