k8s-for-greeks / gpmr

Greek Pet Monster Race - K8s and Cassandra at scale
Apache License 2.0
43 stars 9 forks source link

Cassandra basic docs? #26

Open paralin opened 8 years ago

paralin commented 8 years ago

Hey,

Could you potentially make a short readme on how to set up the cassandra petset? Just to get me up to speed? I'm reading through it right now and it seems to make sense except for:

$ kubectl create -f cassandra-petset-local.yaml
unable to decode "cassandra-petset-local.yaml": quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

Just a short doc with how you usually go about testing it would be nice. Nothing fancy or polished.

Thanks.

paralin commented 8 years ago

I fixed the document by putting quotes around anything that started with a number but ended / had a letter in it.

The basic idea is:

$ kubectl create -f cassandra-service.yaml
$ kubectl create -f cassandra-petset-local.yaml

Wait for the first node "cassandra-0" to be ready, then change "replicas" to 3 or so on the PetSet, as the nodes are stable edit each of their pod definitions and flip "initialized" to false which will trigger the node to enter the quorum and become "NORMAL" from "UP".

Works very well. Great work!

chrislovecnm commented 8 years ago

Thanks @paralin that was actually an issue that I was working through that was introduced in v1.3.0-alpha.5. Actually going to file a bug with kubernetes folks. I will update either the Cassandra example in the kubernetes project or the documentation here.

chrislovecnm commented 8 years ago

@paralin I don't think you have to wait for a node to be ready. I now have an environment where I can test more, but I have launched 3 nodes at the same time.

paralin commented 8 years ago

I used a linter to check the Yaml and it's actually valid but for some reason Kubernetes doesn't read it properly. It makes sense since the integer in the beginning of the value would suggest a number so it would read it as such, but it's not too big of a deal to put quotes around the values.

@chrislovecnm What does setting initialized to false actually do? How do you control what it does?

Also we need to mint a new image for cassandra that uses dumb-init, the current one does not process the terminate signal properly, which means Cassandra will always wait the 30 second grace period and then be killed ungracefully.

chrislovecnm commented 8 years ago

@chrislovecnm What does setting initialized to false actually do? How do you control what it does?

I am guessing that you are referring to

pod.alpha.kubernetes.io/initialized: "true"

It is my understanding that changing it to false will only launch one pod. Something else that you will want in your documentation @bprashanth :)

chrislovecnm commented 8 years ago

Thanks for the catch in regards to stopping. Would we have the same behavior in alpine? I have been wanting to move this over to a small image, min ubuntu or alpine. @mward29 has done some work with alpine already, but will we have the same issue?

paralin commented 8 years ago

Alpine is fine, you can just download the precompiled dumb-init binary in a run step. But yes you would need dumb-init.

chrislovecnm commented 8 years ago

Opened another issue about the init issue.

chrislovecnm commented 8 years ago

@paralin What did you do to fix the yaml? We need to file this as a bug with k8s

paralin commented 8 years ago

Actually there's no bug, it's just a problem with your "resources.limits", it should be something like "1m" not just "1". I removed the section entirely.

paralin commented 8 years ago

Patch here:

diff --git a/pet-race-devops/k8s/cassandra/cassandra-petset-local.yaml b/pet-race-devops/k8s/cassandra/cassandra-petset-local.yaml
index d65f79e..c651116 100644
--- a/pet-race-devops/k8s/cassandra/cassandra-petset-local.yaml
+++ b/pet-race-devops/k8s/cassandra/cassandra-petset-local.yaml
@@ -17,7 +17,6 @@ spec:
       containers:
       - name: cassandra
         image: gcr.io/google-samples/cassandra:v9
-        #image: 10.100.179.231:5000/cassandra
         imagePullPolicy: Always
         command:
           - /run.sh
@@ -33,14 +32,11 @@ spec:
         # If you need it it is going away in C* 4.0
         #- containerPort: 9160
         #  name: thrift
-        resources:
-          limits:
-            cpu: 1
         env:
           - name: MAX_HEAP_SIZE
-            value: 512M
+            value: "512M"
           - name: HEAP_NEWSIZE
-            value: 100M
+            value: "100M"
           - name: POD_NAMESPACE
             valueFrom:
               fieldRef:
chrislovecnm commented 8 years ago

yah ... that is a bug ...

chrislovecnm commented 8 years ago

The issue with the quoting is with the cpu limit. See https://github.com/kubernetes/kubernetes/issues/26898

paralin commented 8 years ago

@chrislovecnm Can we maybe start to outline how to set up a multiple datacenter cluster with petsets?

chrislovecnm commented 8 years ago

@paralin just thinking about that actually ... Literally. Maybe you can help with the design. So more than one kubernetes instance does not work, as all DCs, Racks and Cluster needs to be in the same k8s instance. You clients need to be inside the same K8s instance. They have to be inside the same k8 instance. The reason it does not work is a proxying problem. You cannot use load balancers. Every C* node needs to talk to every other C* node. Only nodes inside of a single K8s cluster can communicate with each other.

Their are a few different challenges:

  1. The seed provider is not DC aware. You need a seed in every DC, and all nodes need to see the other seeds. That is the fun one.
  2. Do you know if a Service can talk to multiple applications? I probably need to file a bug to see if a service selector can be setup for multiple apps to talk to it. If we had multiple services and modified the seed provider, we could do that as well.
  3. Setting up a snitch can depend on your cloud deployment. With affinity rules or node labels you can pretty much have a balance deployment across multiple zones in AWS or GCE.
  4. We could hack in a SimpleSnitch by accessing node labels from the pod itself.
  5. Rolling restarts are not supported yet. Will be soon.

Probably will open an issue in the kubernetes project and continue the discussion there.

tldr; DCs across multiple K8 instances is not supported, cannot use load balancers. Need to put in an issue about this with the kubernetes project. Seed provider needs some design and some work to support multiple DC's. Multiple racks will probably work, but not tested.

@mward29 you want to pipe in here??

chrislovecnm commented 8 years ago

Btw the issue you found with the numbers in yaml is fixed in HEAD of k8s https://github.com/kubernetes/kubernetes/pull/26907

paralin commented 8 years ago

The way I'm going to do it is:

A service can talk to multiple things, yes.

I don't see why you wouldn't be able to do multiple DCs with this in mind.

paralin commented 8 years ago

Whoops... didn't mean to close

chrislovecnm commented 8 years ago

@paralin please document how you setup the vpn. What cloud provider are you in? How are you going to know what your seed ips are?

paralin commented 8 years ago

GCE and it is actually an extremely complex setup I ended up building for this... Will document it eventually. But the nice thing is I am now bridging a couple of mesh networks with the GCE network and everything can talk to everything else.

The only thing I didn't figure out is hitting services from outside the cluster which I made an issue about here:

https://github.com/kubernetes/kubernetes/issues/27161

The basic components of the setup are babeld and openvpn with some iptables rules.

paralin commented 8 years ago

... and I accidentally closed it again...

chrislovecnm commented 8 years ago

@paralin I looped you and @mward29 on a couple of issues on K8s project. Working on these issues with the K8s team, will further strengthen the capability of C* on K8s.