feature request: add support for floating ips

wiemann commented 6 years ago

Add support to bind "floating ips" to specific nodes (worker/master) to have a "static ip" in case the instance reboots and gets assigned a new ip.

pierreozoux commented 6 years ago

duplicate of https://github.com/xetys/hetzner-kube/issues/13

xetys commented 6 years ago

not exactly. I think that's because of mixing "failover IPs" and "floating IPs". The first issue was about failover IP. But I feel you really mean a floating IP, controlled by some script or stuff in k8s, what assigns floating IP to specific servers. The answer in https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/6 stated, that you cannot assign Hetzners existing failover IPs (they exist for bare metal for a long time already) to the new cloud. But that is not what we are looking for. We need a script handling floating IPs.

This actually can be possibly achieved using a full k8s setup, if assigning an IP over API to a node is enough. But it will be most likely an on-host keepalived solution, which involved some firewall updates as well. However, this is a notable piece of work, and I'm still convinced, it is actually a ticket for the hcloud-controller-manager, or a distinct project, which gets deployed as a k8s deployment.

An alternative approach would be adding a add floating-ip -n <cluster-name> <ip>, that involves a keepalived configuration for all worker nodes. In that case, you can add just the floating IP to your domain, and the IPs are assigned to the first working node in the priority list. That's then better for usage with ingress, as this doesn't bring a LoadBalancer service type into the cluster.

JohnnyQQQQ commented 6 years ago

if assigning an IP over API to a node is enough

It's not, you need to configure it on your host, would be too easy otherwise :smile:

https://wiki.hetzner.de/index.php/CloudServer/en#What_are_floating_IPs_and_how_do_they_work.3F

xetys commented 6 years ago

yeah, it was a poor guess. I would prefer the keepalived solution anyway.

JohnnyQQQQ commented 6 years ago

I think this a quite an important feature, as for now the cluster isn't really HA, as you can't control it from the outside when the master-node dies that is configured in the kubectl config or am I missing a part?

It would also solve the single point of failure with ingress for people that don't want to use dns loadbalancing/failover.

xetys commented 6 years ago

That's a bit "fifty-fifty". If you install k8s with HA mode enabled, all the master nodes are client-based load balanced. This means, if the major master fails, the cluster itself is still running, and this matters more or the HA experience, than if you have remote access with kubectl. I verified this feature before merging it.

On the other side, currently, the kubeconfig is generated with the first masters IP address. So if this one fails, you won't get any response when using kubectl, and if you just change the IP to the seconds master in "~/.kube/config", it will fail, as the other masters are not part of the cert SAN. This is more an UX issue rather than harming HA. If you log in to master 2 and use the kubectl towards "127.0.0.1:6443", you will get a valid response and can use your cluster.

If you add floating IP, you have to distinguish between a master IP (and the reason you might need it) and worker floating IP. You could assign a floating IP to the masters (as well as adding it to the SAN before you create the cluster), to make kubectl work, even if masters fail. The motivation for worker floating IP is to have one IP, which is switching between edge nodes for ingress stuff.

JohnnyQQQQ commented 6 years ago

Valid points. I think we should start with a concept on how we can integrate floating IPs in general. Then we can take a look at the different use cases, but ingress is very neat indeed.

lakano commented 6 years ago

@xetys I would like to use this feature when it's ready, but until then, I'll buy a floating IP right now, update configuration on each master, and update our DNS with this floating IP. Is your feature will be able to use an floating IP already created please?

JohnnyQQQQ commented 6 years ago

I did a few tests with keepalived, works really good 🚀

Here is my failover script:

#!/bin/bash
# quit on error
set -e
# Get all informations from the API
# Change the description filtering in the FLOATING_ID and FLOATING_IP part according to your description
export SERVER_ID=$(curl -H "Authorization: Bearer YOURTOKEN" "https://api.hetzner.cloud/v1/servers?name=$HOSTNAME" | jq '.servers[0].id')
export FLOATING_IP_ID=$(curl -H "Authorization: Bearer YOURTOKEN" "https://api.hetzner.cloud/v1/floating_ips" | jq '.floating_ips[] | select(.description=="keepalived")' | jq '.id')
export FLOATING_IP=$(curl -H "Authorization: Bearer YOURTOKEN" "https://api.hetzner.cloud/v1/floating_ips" | jq '.floating_ips[] | select(.description=="keepalived")' | jq -r '.ip')
# Change floating ip in hetzner backend
curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer YOURTOKEN" -d "{\"server\":$SERVER_ID}" "https://api.hetzner.cloud/v1/floating_ips/$FLOATING_IP_ID/actions/assign"
# Add the ip adress to the default network 
ip addr add $FLOATING_IP/32 dev eth0 || echo "IP was already added to the interface"

Only new dependency is jq, you can get it by calling apt install jq. I will try to wrap it up for hetzner-kube. Obv. you don't need to export the commands, but I did it for debugging.

YoSev commented 6 years ago

Hello,

just an idea for a workaround until Hetzner provides failover-target's for floating ip's (requires docker swarm). I am using hetzner cloud to build a X-manager-based docker swarm (currently 3 managers). I created a service that scales to only one 1 server which is updating the floating IP using the Hetzner-API based on a placement constraints which bind's this service to the node.ManagerStatus.Leader=true only. The script checks the floating ip's target every x seconds and only updates the target, once it's pointing to a different ip then self

summarized: the docker swarm Leader, and only the leader, automatically updates the floating-ip target if needed

cornelius-keller commented 6 years ago

Hi all, I created a fork of https://github.com/kubernetes/contrib/tree/master/keepalived-vip and modified it in a way that it can use notify scripts. I wanted did this to solve exactly this problem. I already tested it in one of my development clusters at hetzner and it seems to be working. I committed the example resources I used so you can try it. This is work in progress, and I would appreciate any feedback and testing. Resource files are here: https://github.com/cornelius-keller/contrib/tree/master/keepalived-vip/notify-example-hetzner. Also commented on https://github.com/hetznercloud/hcloud-cloud-controller-manager/issues/6 but I guess here is more appropriate.

cornelius-keller commented 6 years ago

Hmmm. I expected a little more exitement, maybe it is not clear what this thing is doing. It is dooing keepalive the kubernetes way, and you don't need anything but kubernetes api to implement it. So what you need to do is fill in your failover ip and credentials in a kubernetes secret

apiVersion: v1
data:
  failover-ip: # echo <your failover ip> | base64
  hetzner-pass: # echo < your hetzner api pass > | base64
  hetzner-user: # echo < your hetzner api user > | base64
kind: Secret
metadata:
  name: hetzner-secret-failoverip-1
type: Opaque

You put your notification script in a config map (or use mine)

apiVersion: v1
kind: ConfigMap
metadata:
  name: vip-notify
data:
  notify.sh: | 
    #!/bin/bash
    ENDSTATE=$3
    NAME=$2
    TYPE=$1
    if [ "$ENDSTATE" == "MASTER" ] ; then
        HOST_IP=$(ip route get 8.8.8.8 | awk '{print $7 }')
        echo "setting Failover IP:  $FAILOVER_IP to Server IP:  $HOST_IP"
        curl -k -u "$HETZNER_USER:$HETZNER_PASS" https://robot-ws.your-server.de/failover/$FAILOVER_IP -d active_server_ip=$HOST_IP
    fi

You also need to put your failover ip again in the original keepalived-vip config map. for now this is a bit of duplication but was the "minimal working" way to implement this. Of course in the final scenario this should point to your nginx-ingress service, not to echoheaders as in the example.

apiVersion: v1
kind: ConfigMap
metadata:
  name: vip-configmap
data:
138.201.14.20: default/echoheaders # add your config map here. must map the base64 encoded IP in secrets.yaml

Finally you need to deploy the keepalived controller. In the example I used an ReplicationController, but you can use a deployment or a Deamonset to have it on all nodes aswell:

apiVersion: v1
kind: ReplicationController
metadata:
  name: kube-keepalived-vip
  labels:
    k8s-app: kube-keepalived-vip
spec:
  replicas: 1
  selector:
    k8s-app: kube-keepalived-vip
  template:
    metadata:
      labels:
        k8s-app: kube-keepalived-vip
        name: kube-keepalived-vip
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - ingress-nginx
              topologyKey: kubernetes.io/hostname
      hostNetwork: true
      serviceAccount: kube-keepalived-vip
      containers:
      - image: quay.io/cornelius/keepalived-vip:0.11_notify
        name: kube-keepalived-vip
        imagePullPolicy: Always
        securityContext:
          privileged: true
        volumeMounts:
        - mountPath: /lib/modules
          name: modules
          readOnly: true
        - mountPath: /opt/notify
          name: notify 
        # use downward API
        env:
          - name: HETZNER_USER
            valueFrom:
              secretKeyRef:
                key: hetzner-user
                name: hetzner-secret-failoverip-1
          - name: HETZNER_PASS
            valueFrom:
              secretKeyRef:
                key: hetzner-pass
                name: hetzner-secret-failoverip-1
          - name: FAILOVER_IP
            valueFrom:
              secretKeyRef:
                key: failover-ip
                name: hetzner-secret-failoverip-1

          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: KEEPALIVED_NOTIFY
            value: /opt/notify/notify.sh
        # to use unicast
        args:
        - --services-configmap=default/vip-configmap
        - --watch-all-namespaces=true
        - --use-unicast=true
        # unicast uses the ip of the nodes instead of multicast
        # this is useful if running in cloud providers (like AWS)
        #- --use-unicast=true
      volumes:
      - hostPath:
          path: /lib/modules
        name: modules
      - configMap:
          name: vip-notify
          defaultMode: 0744 
        name: notify

The pod anti affinity is needed In my case because I still have an nignx ingress running with a hostPort mapping. But this will be replaced by the keepalive version soon.

So what happens: I tested the grade of HA that this provides by scaling the replication controller to 3 replicas. If I enter one of the pods and look at the keepalived config I see something like this:

root@devcluster06:/# cat /etc/keepalived/keepalived.conf 

global_defs {
  vrrp_version 3
  vrrp_iptables KUBE-KEEPALIVED-VIP
}

vrrp_instance vips {
  state BACKUP
  interface enp4s0
  virtual_router_id 50
  priority 106
  nopreempt
  advert_int 1

  track_interface {
    enp4s0
  }
   notify /opt/notify/notify.sh 

  unicast_src_ip 94.130.34.213
  unicast_peer { 
    138.201.37.92
    138.201.52.38
    144.76.138.212
    144.76.223.202
    144.76.223.203
    46.4.114.60
  }

  virtual_ipaddress { 
    138.201.14.20
  }
}

# Service: default/echoheaders
virtual_server 138.201.14.20 80 {
  delay_loop 5
  lvs_sched wlc
  lvs_method NAT
  persistence_timeout 1800
  protocol TCP

  real_server 10.233.88.58 8080 {
    weight 1
    TCP_CHECK {
      connect_port 8080
      connect_timeout 3
    }
  }

}

Now I had a curl request to the IP running in an endless loop, printing the echoed headers to the console. When I kill one of the pods in the replicaset there is a pause of 15-20 seconds in the output. Then the ip is switched to another node and it continues. Same shuould happen if the node currently holding the virtualip dies or reboots. I think this is the fastest failover you get with hetzner and k8s, especially it is much faster then what I used before. Before that I had a single pod failover IP controller which was scheduling via affinity to one node where also nginx was running with a hostPort mapping. So when this node with this pod died, it took a while for k8s to reschedule it to another node, up to 5 minutes. I hope this a bit more detailed explanation helps you to understand what I wanted to achieve and maybe you give it a try. Please note that I am neither a user of hetzner kube nor an expert with keepalived. I am running k8s clusters on hetzner since more then two years on hetzner on bare metal and the failover IP problem was until now one I did not solve to my satisfaction. HTH

exocode commented 6 years ago

@cornelius-keller so, I am very excited 🥇 I cant wait to see it in my production env wrapped in a helm chart. :-) Thank you!

exocode commented 6 years ago

@cornelius-keller It's me again. I made a helm chart of the informations you delivered, but I hang on some (newbie) problems: Maybe you (or someone else) wanna try to help get that running?

Simply fork it

Change the Values.yaml to your needs. helm install --name hetzner-failover hetzner-failover

https://github.com/exocode/helm-charts/

This is where I struggle at the moment:

kubectl describe replicationcontroller/kube-keepalived-vip
Name:         kube-keepalived-vip
Namespace:    default
Selector:     k8s-app=kube-keepalived-vip
Labels:       k8s-app=kube-keepalived-vip
Annotations:  <none>
Replicas:     0 current / 1 desired
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           k8s-app=kube-keepalived-vip
                    name=kube-keepalived-vip
  Service Account:  kube-keepalived-vip
  Containers:
   kube-keepalived-vip:
    Image:      quay.io/cornelius/keepalived-vip:0.11_notify
    Port:       <none>
    Host Port:  <none>
    Args:
      --services-configmap=default/vip-configmap
      --watch-all-namespaces=true
      --use-unicast=true
    Environment:
      HETZNER_USER:       <set to the key 'hetzner-user' in secret 'hetzner-secret-failoverip-1'>  Optional: false
      HETZNER_PASS:       <set to the key 'hetzner-pass' in secret 'hetzner-secret-failoverip-1'>  Optional: false
      FAILOVER_IP:        <set to the key 'failover-ip' in secret 'hetzner-secret-failoverip-1'>   Optional: false
      POD_NAME:            (v1:metadata.name)
      POD_NAMESPACE:       (v1:metadata.namespace)
      KEEPALIVED_NOTIFY:  /opt/notify/notify.sh
    Mounts:
      /lib/modules from modules (ro)
      /opt/notify from notify (rw)
  Volumes:
   modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
   notify:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      vip-notify
    Optional:  false
Conditions:
  Type             Status  Reason
  ----             ------  ------
  ReplicaFailure   True    FailedCreate
Events:
  Type     Reason        Age                From                    Message
  ----     ------        ----               ----                    -------
  Warning  FailedCreate  5m (x18 over 15m)  replication-controller  Error creating: pods "kube-keepalived-vip-" is forbidden: error looking up service account default/kube-keepalived-vip: serviceaccount "kube-keepalived-vip" not found

cornelius-keller commented 6 years ago

Hi @exocode The Service account used for rbac was missing in my quick howto. apologizes. I created a pr for you including a service account.

exocode commented 6 years ago

Hi @cornelius-keller

I am still hanging but I had a slight progress... Maybe you would help me to finish that chart?

CrashLoopBackOff: Back-off 1m20s restarting failed container=kube-keepalived-vip pod=kube-keepalived-vip-42rjk_default(275cade9-6192-11e8-9a63-9600000ae6f9)

The Container crashes. When I act quickly after installing with helm I can see the keepalived.conf, which seems not so complete like your example:

kubectl exec kube-keepalived-vip-42rjk -it -- cat /etc/keepalived/keepalived.conf

global_defs {
  vrrp_version 3
  vrrp_iptables KUBE-KEEPALIVED-VIP
}

vrrp_instance vips {
  state BACKUP
  interface eth0
  virtual_router_id 50
  priority 109
  nopreempt
  advert_int 1

  track_interface {
    eth0
  }
   notify /opt/notify/notify.sh

  unicast_src_ip 88.99.15.132
  unicast_peer {
    138.201.152.58
    138.201.155.184
    138.201.188.206
    138.201.188.50
    78.46.152.230
    78.47.135.218
    78.47.197.112
    88.198.148.214
    88.198.150.193
  }

  virtual_ipaddress {
  }
}

this is the pod log:

kubectl logs kube-keepalived-vip-42rjk kube-keepalived-vip

Sun May 27 09:45:22 2018: Starting Keepalived v1.4.2 (unknown)
Sun May 27 09:45:22 2018: WARNING - keepalived was build for newer Linux 4.4.117, running on Linux 4.4.0-127-generic #153-Ubuntu SMP Sat May 19 10:58:46 UTC 2018
Sun May 27 09:45:22 2018: Opening file '/etc/keepalived/keepalived.conf'.
Sun May 27 09:45:22 2018: Starting Healthcheck child process, pid=21
Sun May 27 09:45:22 2018: Starting VRRP child process, pid=22
Sun May 27 09:45:22 2018: Opening file '/etc/keepalived/keepalived.conf'.
Sun May 27 09:45:22 2018: Netlink: error: message truncated
Sun May 27 09:45:22 2018: Registering Kernel netlink reflector
Sun May 27 09:45:22 2018: Registering Kernel netlink command channel
Sun May 27 09:45:22 2018: Registering gratuitous ARP shared channel
Sun May 27 09:45:22 2018: Opening file '/etc/keepalived/keepalived.conf'.
Sun May 27 09:45:22 2018: Using LinkWatch kernel netlink reflector...
Sun May 27 09:45:23 2018: Opening file '/etc/keepalived/keepalived.conf'.
Sun May 27 09:45:23 2018: Got SIGHUP, reloading checker configuration
Sun May 27 09:45:23 2018: Opening file '/etc/keepalived/keepalived.conf'.
Sun May 27 09:45:23 2018: Netlink: error: message truncated
Sun May 27 09:45:23 2018: Registering Kernel netlink reflector
Sun May 27 09:45:23 2018: Registering Kernel netlink command channel
Sun May 27 09:45:23 2018: Registering gratuitous ARP shared channel
Sun May 27 09:45:23 2018: Opening file '/etc/keepalived/keepalived.conf'.
Sun May 27 09:45:23 2018: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Sun May 27 09:45:23 2018: (vips): No VIP specified; at least one is required
Sun May 27 09:45:24 2018: Stopped
Sun May 27 09:45:24 2018: Keepalived_vrrp exited with permanent error CONFIG. Terminating
Sun May 27 09:45:24 2018: Stopping
Sun May 27 09:45:24 2018: Stopped
Sun May 27 09:45:29 2018: Stopped Keepalived v1.4.2 (unknown)

kubectl describe pod kube-keepalived-vip-42rjk kube-keepalived-vip

Name:           kube-keepalived-vip-42rjk
Namespace:      default
Node:           cluster-worker-06/88.99.15.132
Start Time:     Sun, 27 May 2018 11:41:38 +0200
Labels:         k8s-app=kube-keepalived-vip
                name=kube-keepalived-vip
Annotations:    <none>
Status:         Running
IP:             88.99.15.132
Controlled By:  ReplicationController/kube-keepalived-vip
Containers:
  kube-keepalived-vip:
    Container ID:  docker://ba60194851d51c02f22fe311e284164af32135a67b71b00f5781d50b03f47616
    Image:         quay.io/cornelius/keepalived-vip:0.11_notify
    Image ID:      docker-pullable://quay.io/cornelius/keepalived-vip@sha256:3fea1c570775366dee56f0da6acdf412f257ee9c521069e7e0fc9a49256949e3
    Port:          <none>
    Host Port:     <none>
    Args:
      --services-configmap=default/vip-configmap
      --watch-all-namespaces=true
      --use-unicast=true
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 27 May 2018 11:48:16 +0200
      Finished:     Sun, 27 May 2018 11:48:23 +0200
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 27 May 2018 11:45:22 +0200
      Finished:     Sun, 27 May 2018 11:45:29 +0200
    Ready:          False
    Restart Count:  6
    Environment:
      HETZNER_USER:       <set to the key 'hetzner-user' in secret 'hetzner-secret-failoverip-1'>   Optional: false
      HETZNER_PASS:       <set to the key 'hetzner-pass' in secret 'hetzner-secret-failoverip-1'>   Optional: false
      HETZNER_TOKEN:      <set to the key 'hetzner-token' in secret 'hetzner-secret-failoverip-1'>  Optional: false
      FLOATING_IP:        <set to the key 'floating-ip' in secret 'hetzner-secret-failoverip-1'>    Optional: false
      POD_NAME:           kube-keepalived-vip-42rjk (v1:metadata.name)
      POD_NAMESPACE:      default (v1:metadata.namespace)
      KEEPALIVED_NOTIFY:  /opt/notify/notify.sh
    Mounts:
      /lib/modules from modules (ro)
      /opt/notify from notify (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-keepalived-vip-token-hkl5m (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  notify:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      vip-notify
    Optional:  false
  kube-keepalived-vip-token-hkl5m:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kube-keepalived-vip-token-hkl5m
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason                 Age               From                                      Message
  ----     ------                 ----              ----                                      -------
  Normal   Scheduled              6m                default-scheduler                         Successfully assigned kube-keepalived-vip-42rjk to cluster-worker-06
  Normal   SuccessfulMountVolume  6m                kubelet, cluster-worker-06  MountVolume.SetUp succeeded for volume "modules"
  Normal   SuccessfulMountVolume  6m                kubelet, cluster-worker-06  MountVolume.SetUp succeeded for volume "notify"
  Normal   SuccessfulMountVolume  6m                kubelet, cluster-worker-06  MountVolume.SetUp succeeded for volume "kube-keepalived-vip-token-hkl5m"
  Normal   Pulling                5m (x4 over 6m)   kubelet, cluster-worker-06  pulling image "quay.io/cornelius/keepalived-vip:0.11_notify"
  Normal   Pulled                 5m (x4 over 6m)   kubelet, cluster-worker-06  Successfully pulled image "quay.io/cornelius/keepalived-vip:0.11_notify"
  Normal   Created                5m (x4 over 6m)   kubelet, cluster-worker-06  Created container
  Normal   Started                5m (x4 over 6m)   kubelet, cluster-worker-06  Started container
  Warning  BackOff                1m (x19 over 6m)  kubelet, cluster-worker-06  Back-off restarting failed container
Error from server (NotFound): pods "kube-keepalived-vip" not found

And sorry: what do you mean by that?: # add your config map here. must map the base64 encoded IP in secrets.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: vip-configmap
data:
138.201.14.20: default/echoheaders # add your config map here. must map the base64 encoded IP in secrets.yaml

The most recent helm chart is here:

https://github.com/exocode/helm-charts/tree/master/hetzner-failover

cornelius-keller commented 6 years ago

Hi @exocode
138.201.14.20: default/echoheaders # add your config map here. must map the base64 encoded IP in secrets.yaml This is a typo. It should men put your failover Ip here:

Basically you should put the the failover IP there as a key, and then as the k8s service it should point to. In this case it is the echo-headers service in the default namespace.

I will setup my hetzner-kube cluster tonight to be able to really test your chart and provide better feedback.

JohnnyQQQQ commented 6 years ago

~~I'm interested in how to add the Floating-IP to the default network interface of the node. I think the best way would be on setup as I'm not sure if there is any way from inside a container.~~

NVM, found out about it in the docker documentation

exocode commented 6 years ago

I got it working, thank you for your patience! My fault. I did not point to the correct ingress. I had "nginx" but in the real one was "nginx-demo". So I changed that: and it worked, yay! A big step for me in my journey through the awesome world of k8s and Hetzner.

Everything looks nice. May someone (@cornelius-keller :-D ) can add the "filter option for the description" field? Like in the example of https://github.com/xetys/hetzner-kube/issues/58#issuecomment-375089121 ?

In the meantime I found out, that there are already charts out there. This is an interesting one: https://github.com/munnerz/keepalived-cloud-provider because it implements the kubernetes cloud-controller-manager and Hetzner has some hcloud thing, which may is also an way to achieve that task (https://github.com/hetznercloud/hcloud-cloud-controller-manager), I hope I understood that all correct...

JohnnyQQQQ commented 6 years ago

I tried it today, seems to work but somehow I can't route any traffic as the requests timed out.

The IP is assigned to the right Node in the Hetzner backend. On the Node I can also find to IP in the network configuration

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether xx:00:00:xx:xx:bb brd ff:ff:ff:ff:ff:ff
    inet 88.99.xx.xx/32 brd 88.99.xx.xx scope global eth0
       valid_lft forever preferred_lft forever
   <!-- Floating IP -->
    inet 78.46.231.xxx/32 scope global eth0
       valid_lft forever preferred_lft forever
   <!-- / Floating IP -->

Any hints? I'm using floating-ips.

JohnnyQQQQ commented 6 years ago

I could narrow down the problem to the ingress-nginx helm chart..

JohnnyQQQQ commented 6 years ago

I got it working with the ingress-nginx helm chart, instead of using the hostnetwork one can pass an external ip

helm install stable/nginx-ingress --name ingress --namespace ingress --set rbac.create=true,controller.kind=DaemonSet,controller.service.type=ClusterIP,controller.service.externalIPs.[0]=YOUR_FLOATING_IP,controller.stats.enabled=true,controller.metrics.enabled=true

Then ingress will be bound to this ip. Otherwise the Firewall will not allow connections.

voron commented 6 years ago

@JohnnyQQQQ I got ingress working via

helm install stable/nginx-ingress --name ingress --namespace ingress --set rbac.create=true,controller.kind=DaemonSet,controller.service.type=ClusterIP,controller.service.externalIPs='{1.2.3.4,5.6.7.8}',controller.stats.enabled=true,controller.metrics.enabled=true

not a .[0]

@cornelius-keller what about using aledbf/kube-keepalived-vip as base image, it looks like it supports multiple IPs per vip-configmap

vitalka200 commented 6 years ago

@exocode @cornelius-keller After deploying hetzner-keepalived-vip for some reason containers crashes with the following error

F0806 11:55:18.222653       1 controller.go:314] Error getting POD information: timed out waiting to observe own status as Running
goroutine 1 [running]:
k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog.stacks(0xc4202a0100, 0xc42012e1c0, 0x83, 0xd1)
    /home/jck/go/src/k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog/glog.go:766 +0xcf
k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog.(*loggingT).output(0x1d5bc80, 0xc400000003, 0xc420422a50, 0x1cdb4fa, 0xd, 0x13a, 0x0)
    /home/jck/go/src/k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog/glog.go:717 +0x30f
k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog.(*loggingT).printf(0x1d5bc80, 0x3, 0x1479b89, 0x21, 0xc4206bdcd8, 0x1, 0x1)
    /home/jck/go/src/k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog/glog.go:655 +0x14b
k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog.Fatalf(0x1479b89, 0x21, 0xc4206bdcd8, 0x1, 0x1)
    /home/jck/go/src/k8s.io/contrib/keepalived-vip/vendor/github.com/golang/glog/glog.go:1145 +0x67
main.newIPVSController(0xc4202024e0, 0x0, 0x0, 0x1, 0x7ffd7dbb4c5b, 0xd, 0x32, 0x3, 0x6)
    /home/jck/go/src/k8s.io/contrib/keepalived-vip/controller.go:314 +0x229
main.main()
    /home/jck/go/src/k8s.io/contrib/keepalived-vip/main.go:127 +0x468

Somebody knows how to solve this?

kaosmonk commented 5 years ago

@JohnnyQQQQ referring to your comment https://github.com/xetys/hetzner-kube/issues/58#issuecomment-396896815, how did you do it since you're running the ingress as a DaemonSet? Did you assign the floating IP to all the workers where ingress is being run beforehand (I'm not even sure if the same floating IP can be assigned to different nodes at the same time)? Or how this should be done?

And when adding a new worker node to an existing cluster, one would need to manually assign the floating IP as well? Would you be using your keepalived script as shown in https://github.com/xetys/hetzner-kube/issues/58#issuecomment-375089121 in case one can't assign the same floating IP address to multiple nodes at the same time?

voron commented 5 years ago

@kaosmonk You can try hetzner-failover-ip helm-chart, example NodeSelector with multiple IPs and nginx-ingress.

kaosmonk commented 5 years ago

@voron can you share some more insights into replicaCount param? It depends on the number of edge routers but it's not clear to me what these are? Are these all worker nodes? Since nginx-ingress is being run as DaemonSet I'd suspect that. Or am I on the wrong track here?

In a case where I have 3 worker nodes in my cluster and ingress is running on each, I'd say I'd need to set replicaCount to 3?

voron commented 5 years ago

@kaosmonk it's a number of keepalived pods to be spawned across k8s nodes. I don't see the reason to use any other value than 2 . If you need faster floating IP switch in case of simultaneous downtime of 2 k8s nodes, both with keepalived pod - there may be a reason to increase this number. But I can just wait in this rare case till k8s detects nodes downtime and re-schedules both keepalived pods to alive nodes, and then one of these keepalived pods will move floating IP to it's node.

kaosmonk commented 5 years ago

Gotcha, I appreciate the explanation!

Nomuas commented 5 years ago

@cornelius-keller Good job !

Did you try to PR the add of the notify script in the original repo (kubernetes/contrib) ?

cornelius-keller commented 5 years ago

@Nomuas Thank you, yes I did. It is here: https://github.com/kubernetes/contrib/pull/2912

md2k commented 5 years ago

Hi all, while keepalived work as expected with this script traffic cannot be routed thru wg0. for example we have HA k8s cluster

IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  FloatingIPHere:443 wlc persistent 1800
  -> 10.0.1.1:6443                Masq    1      0          0
  -> 10.0.1.2:6443                Masq    1      0          0
  -> 10.0.1.3:6443                Masq    1      0          0

and current keepalived master on server where api endpoint 10.0.1.1. Everything is ok, because keepailived master, endpoint of service on same server, but if i kill keepalived pod, master moved to another one (for example 10.0.1.3), while floating ip moved successfully keepalived continue trying to send traffic to 10.0.1.1, as result it trying send it over wg0 , as result nothing work.

imho best option to use this keepalived as failover solution for API for example, it is use single keepalived as master, with afinity on masters and short thresholds so kubernety respawn pod at another master if current one dies, and pickup not a pod ip but node original ip, i think. about using it with Ingress i'm not sure, but think that situation will be same as with Pods IP (endpoints).

if anyone know how to solve issue with traffic routing over wg0 for floatingIP, will be nice, because not have too much time to dive into this

md2k commented 5 years ago

another possibility to use ClusterIP, because cluster IP is a just internal firewall rule (DNAT), but i not sure how to retrive ith with keepilived (need to dive into their code)

gentios commented 5 years ago

@exocode what is the status of your chart, since I cannot download it and it's not in the hub, are you still maintaining it ?

exocode commented 5 years ago

@gentios I am back on track since one week (sadly had other projects on my task list). I saw that Cornelius's script was implemented into keepalive-vip. I have to spin up a cluster and have to reinvestigate (I sparsely remember what I did change 10 months ago. If you have some suggestions or ideas to move that chart forward, simply drop me a few lines.

Russell-IO commented 3 years ago

This is interesting, I will try to implement it

KlavsKlavsen commented 3 years ago

https://github.com/schemen/kubernetes-hetzner-keepalived <- I found this - which should work for "pointing the floating ip" - to a k8s server that is "UP".. automaticly.. I'm thinking perhaps this could be "tied" to metallb somehow - so this code could work with metallb somehow.. ?

xetys / hetzner-kube

feature request: add support for floating ips #58