k8s-for-greeks / gpmr

Greek Pet Monster Race - K8s and Cassandra at scale
Apache License 2.0
43 stars 9 forks source link

Cassandra docker stop issues #27

Closed chrislovecnm closed 8 years ago

chrislovecnm commented 8 years ago

Per @paralin -

Also we need to mint a new image for cassandra that uses dumb-init, the current one does not process the terminate signal properly, which means Cassandra will always wait the 30 second grace period and then be killed ungracefully.

paralin commented 8 years ago

+1 for switching to alpine, should not be an issue at all, the only tough thing will be getting cassandra and all its dependencies set up properly inside alpine.

chrislovecnm commented 8 years ago

@paralin Btw there is a work in progress https://github.com/mward29/gpmr/tree/master/pet-race-devops/docker/cassandra ~ the seed provider jar is most likely not up to date (I need to get the seed provider into C* source base). You can get the latest seed provider jar out of the k8s examples.

But if you have a chance to get that image working, please submit a PR! Not sure when @mward29 will have a chance to get to it. Until Alpine gets out of edge, and releases, I do not want to move the K8s example Cassandra docker to alpine. You probably know that the latest alpine docker does not support the dns prefixing that k8s uses.

paralin commented 8 years ago

@chrislovecnm What's the difference with that version? Seems like it supports a different seeding mechanism and a config file?

chrislovecnm commented 8 years ago

@paralin are you on slack?

We need to combine the latest changes in the k8s example and the alpine image that @mward29 built.

chrislovecnm commented 8 years ago

BTW https://github.com/k8s-for-greeks/gpmr/pull/28 just merged ... Still needs some work, but a good start.

paralin commented 8 years ago

I have an account on the slack, I'll log on around 1 PST if you want to discuss.

chrislovecnm commented 8 years ago

I am actually on now for a bit.

chrislovecnm commented 8 years ago

Either that or just lay out more information on what you questions are here.

chrislovecnm commented 8 years ago

https://github.com/k8s-for-greeks/gpmr/tree/master/pet-race-devops/docker/cassandra-debian provides a new debian image with dumb-init. Not going to use alpine yet, since cqlsh requires python.

Also running into issues with two nodes trying to join the ring at the same time.

chrislovecnm commented 8 years ago

Found a couple of other things as well .. SeedProvider initial ip was not correct

chrislovecnm commented 8 years ago

@paralin this should be fixed. Make more tweaks to the debian docker. You have a chance to test?

paralin commented 8 years ago

@chrislovecnm Just tested the debian one, seems the nodes all start up properly, then this happens:

/run.sh: line 84:    18 Killed                  cassandra -R -f

Starting Cassandra on 10.244.1.23
CASSANDRA_RPC_ADDRESS 0.0.0.0
CASSANDRA_NUM_TOKENS 32
CASSANDRA_CLUSTER_NAME 'Test Cluster'
CASSANDRA_LISTEN_ADDRESS 10.244.1.23
CASSANDRA_BROADCAST_ADDRESS 10.244.1.23
CASSANDRA_BROADCAST_RPC_ADDRESS 10.244.1.23

[hang]

After this it's just a persistent loop of them getting restarted by the liveness check I would assume.

paralin commented 8 years ago

And then after some time...

INFO  07:08:55 Using Netty Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c, netty-codec=netty-codec-4.0.23.Final.208198c, netty-codec-http=netty-codec-http-4.0.23.Final.208198c, netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c, netty-common=netty-common-4.0.23.Final.208198c, netty-handler=netty-handler-4.0.23.Final.208198c, netty-transport=netty-transport-4.0.23.Final.208198c, netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c, netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c, netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
INFO  07:08:55 Starting listening for CQL clients on /0.0.0.0:9042 (unencrypted)...
INFO  07:08:55 Binding thrift service to /0.0.0.0:9160
INFO  07:08:55 Listening for thrift clients...
INFO  07:08:57 Node /10.244.1.22 is now part of the cluster
INFO  07:08:57 InetAddress /10.244.1.22 is now UP
INFO  07:08:57 Created default superuser role 'cassandra'
WARN  07:08:57 Not marking nodes down due to local pause of 23703253181 > 5000000000
/run.sh: line 84:    15 Killed                  cassandra -R -f
chrislovecnm commented 8 years ago

You on slack?

paralin commented 8 years ago

What's happening here is it's maxing out the memory on the host and crashing everything.

chrislovecnm commented 8 years ago

Confused a bit ... is it working?

chrislovecnm commented 8 years ago

@paralin can we close this?

paralin commented 8 years ago

Yes

chrislovecnm commented 8 years ago

Fixed in latest debian docker