screeley44 commented 8 years ago

@vyshane - hello, I'm experimenting with your examples, everything seems to run fine, I have an openshift 3.1 cluster running master + 1 node and gluster cluster on the backend for Persistent Volume support.

I created peer-service, service and rc and my pods run, and I'm using a glusterfs volume for data persistence, the data is persisted on multiple restarts of the pods/rc but when I scale I'm not seeing the pods join the C* ring - and not sure what I'm missing. I don't have a ton of experience with k8 or cassandra but from each container I can ping cassandra-peer (peer service)- so I know they are able to connect.

Unclear to me right now if I need to change my PEER_DISCOVERY_DOMAIN or something else?

      env:
        # Feel free to change the following:
        - name: CASSANDRA_CLUSTER_NAME
          value: Test Cluster
        - name: CASSANDRA_DC
          value: datacenter1
        - name: CASSANDRA_RACK
          value: rack1
        - name: CASSANDRA_ENDPOINT_SNITCH
          value: GossipingPropertyFileSnitch

        # The peer discovery domain needs to point to the Cassandra peer service
        - name: PEER_DISCOVERY_DOMAIN
          value: cassandra-peers.default.cluster.local.

some output from oc (kubectl for openshift): [root@ose1 cassandra-custom]# oc get pods NAME READY STATUS RESTARTS AGE cassandra-vfujv 1/1 Running 0 59s cassandra-x36ay 1/1 Running 0 1m [root@ose1 cassandra-custom]# oc exec -it cassandra-x36ay -- nodetool status testspace

Datacenter: datacenter1

Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.1.0.32 176.43 KB 256 100.0% 03b19bd1-ce65-4525-89e7-b23c9b3f0a92 rack1

vyshane commented 8 years ago

@screeley44 do you get a list of IP addresses when you do a dig cassandra-peers.default.cluster.local from a Cassandra container?

What's the output of oc get namespaces? It's possible that Openshift uses a different namespace from default.cluster.local.

screeley44 commented 8 years ago

@vyshane - I'm using the default namespace (also referred to as project for OSE):

[root@ose1 usr_configs]# oc get namespaces NAME LABELS STATUS AGE default Active 18d openshift Active 18d openshift-infra Active 18d

my dig is not returning the ipaddrs of the containers:

root@cassandra-vfujv:/etc# dig $PEER_DISCOVERY_DOMAIN

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> cassandra-peers.default.cluster.local. ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 32805 ;; flags: qr aa rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION: ;cassandra-peers.default.cluster.local. IN A

;; AUTHORITY SECTION: cluster.local. 60 IN SOA ns.dns.cluster.local. hostmaster.cluster.local. 1449158400 28800 7200 604800 60

;; Query time: 1 msec ;; SERVER: 192.168.122.251#53(192.168.122.251) ;; WHEN: Thu Dec 03 16:49:22 UTC 2015 ;; MSG SIZE rcvd: 109

root@cassandra-vfujv:/etc# dig cassandra-peers

; <<>> DiG 9.9.5-9+deb8u3-Debian <<>> cassandra-peers ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 8607 ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; WARNING: recursion requested but not available

;; QUESTION SECTION: ;cassandra-peers. IN A

;; Query time: 0 msec ;; SERVER: 192.168.122.251#53(192.168.122.251) ;; WHEN: Thu Dec 03 16:50:23 UTC 2015 ;; MSG SIZE rcvd: 33

The services (cassandra-peers and cassandra-service) look good to me based on get services and get endpoints:

[root@ose1 cassandra-custom]# oc get services NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE cassandra-peers None 7000/TCP,7001/TCP name=cassandra-cluster 1d cassandra-service 172.30.209.133 9042/TCP name=cassandra-cluster 1d cassandra-service-google 172.30.207.210 9042/TCP app=cassandra 1h kubernetes 172.30.0.1 443/TCP,53/UDP,53/TCP 14d [root@ose1 cassandra-custom]# oc get endpoints NAME ENDPOINTS AGE cassandra-peers 10.1.0.46:7001,10.1.0.47:7001,10.1.0.46:7000 + 1 more... 1d cassandra-service 10.1.0.46:9042,10.1.0.47:9042 1d cassandra-service-google 10.1.0.54:9042,10.1.0.55:9042 1h glusterfs-cluster 192.168.122.221:1,192.168.122.222:1 3d kubernetes 192.168.122.251:53,192.168.122.251:53,192.168.122.251:8443 14d

vyshane commented 8 years ago

It looks like DNS is not working for services. Is the DNS addon enabled for the Kubernetes cluster?

ScubaDrew commented 8 years ago

Hello, I am having a similar problem. DNS seems to respond, but the nodes are not joining:

root@cassandra-c7wdb:/# dig $PEER_DISCOVERY_DOMAIN

; <<>> DiG 9.9.5-9+deb8u6-Debian <<>> cassandra-peers.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63289
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;cassandra-peers.default.svc.cluster.local. IN A

;; ANSWER SECTION:
cassandra-peers.default.svc.cluster.local. 30 IN A 10.244.0.4
cassandra-peers.default.svc.cluster.local. 30 IN A 10.244.0.3

;; Query time: 1 msec
;; SERVER: 10.0.0.10#53(10.0.0.10)
;; WHEN: Wed May 04 17:32:47 UTC 2016
;; MSG SIZE  rcvd: 91

root@cassandra-c7wdb:/# nodetool status           
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.244.0.3  111.87 KB  256          100.0%            058c38c7-0f79-419c-b42a-a7c2f9782110  Kubernetes Cluster

vyshane commented 8 years ago

Do you see any log errors when you tail the cassandra pods?

ScubaDrew commented 8 years ago

I think what happened was the DNS did not populate in time for the second node. I've got a cluster up now that has two nodes by waiting some time after the first node was up.

Have you used this setup very extensively @vyshane ?

vyshane / cassandra-kubernetes

nodetool status not showing dynamic pods joining #2

Datacenter: datacenter1