xmidt-org / xmidt

Highly scalable pipes for communicating with devices all over the place.
Apache License 2.0
17 stars 19 forks source link

K8s #13

Closed Equanox closed 4 years ago

Equanox commented 4 years ago

This Pull Request is WIP and hopefully prevents someone from doing duplicated work. It also should start a discussion about possible scaling implications when deploying xmidt with k8s here at DTAG. ​ The added k8s deployment is heavily based on the provided docker-compose deployment. A helm chart is added to express dependencies to Consul and Prometheus(wip). Most config files (from ./deploy/docker-compose/docFiles ) are added to ConfigMaps in their respective files originated at ./deploy/kubernetes/xmidt-cloud/templates.

As k8s needs access to a docker image registry it would be nice to add public repositories on hub.docker.com for each service (petasos, talaria, etc.). Take a look at ./deploy/kubernetes/xmidt-cloud/values.yaml. It would be even better to automate this with a CI process. Maybe Github Actions?

Scaling: Right now, only one talaria instance is deployed, due implications with service discovery. The thing is, each talaria instance registers itself in Consul with a hardcoded address on which it is reachable by others.

Excerpt from talaria config (./deploy/kubernetes/xmidt-cloud/templates/talaria.yaml)

registrations:
  - 
    id: "talaria"
    name: "talaria"
    tags:
      - "dev"
      - "docker"
      - "stage=dev"
      - "flavor=docker"
    address: "http://talaria"
    scheme: "http"
    port: 6200
    checks:
      -
        checkID: "talaria:http"
        http: "http://talaria:6201/health"
        interval: "30s"
        deregisterCriticalServiceAfter: "70s"

This doesn't play well with K8s scaling feature (replicas), as traffic is normally load balance across a ReplicaSet. In our case it is important to route requests to the correct talaria instance as only one specific talaria instance holds the websocket connection to a device. One solution could be to use k8s "StateFulsets" promise about "Stable, unique network identifiers." https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id, but i'm not yet sure how to pass this information down to talarias config. This needs some further investigation on my side. Or maybe you have a better solution how to tackle this.

I don't have too much insights to the other services, so there might be scaling implications as well. Even though for experimentation with a k8s deployment one talaria instance seems to be enough.

CLAassistant commented 4 years ago

CLA assistant check
Thank you for your submission, we really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Nösner, Matthias (ext) seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

kcajmagic commented 4 years ago

Awesome, thanks for submitting a PR on K8s. It has been something that I have been wanting to get working for a while now.

Overall this PR looks good, and I absolutely love that you added the README.md file for for it.

Scaling in general for XMiDT inside of K8s is a hard problem, that I haven't found a good solution for because an outside client needs to be able to talk to any talaria server at any given time. It appears that StatefulSets solves this problem. As for the other services, using round robing between the pods will be totally fine.

The config file can be overwritten with environment variables. For example, if you want to change the primary address port of talaria to 7777 you can set the environment variable TALARIA_PRIMARY_ADDRESS=":7777"

Equanox commented 4 years ago

Sorry for the ambigous pull requests. Closing this due to #14 @kcajmagic can you repost your comment at #14?