ori-edge / k8s_gateway

A CoreDNS plugin to resolve all types of external Kubernetes resources
Apache License 2.0
295 stars 64 forks source link

Continue serving DNS even when cluster is offline #117

Closed onedr0p closed 11 months ago

onedr0p commented 2 years ago

Hi 👋🏼

I am using k8s_gateway with the following config on Opnsense. I use this instead of Unbound and dnsmasq that is provided by Opnsense. So for example if my cluster is offline, k8s_gateway won't start due to that.

I would hope it's possible to change this behavior but maybe this already works and my configuration is wrong?

(common) {
  bind 127.0.0.1 ::1
  errors
  log
  reload
  loadbalance
  cache 300
  loop
  local
  prometheus 192.168.1.1:9153
}

. {
  import common
  k8s_gateway cluster-domain.com {
    resources Ingress
    ttl 1
    kubeconfig /usr/local/etc/coredns/kubeconfig
    fallthrough
  }
  forward . tls://1.1.1.1 tls://1.0.0.1 {
    tls_servername cloudflare-dns.com
  }
}

non-cluster-domain.com {
  import common
  k8s_gateway . {
    resources Ingress
    ttl 30
    kubeconfig /usr/local/etc/coredns/kubeconfig
  }
}
networkop commented 2 years ago

hey @onedr0p , have you considered the coredns's cache plugin with a high TTL value?

onedr0p commented 2 years ago

You can see in my config I am using the cache plugin. tl;dr, the issue is that if your cluster is offline, k8s_gateway fails to start. Try to start k8s_gateway with a kubeconfig pointing to an IP:port not serving a cluster.

networkop commented 2 years ago

ah yeah, I think failing to start is kinda expected. If the plugin can't reach the API server, it can't discover k8s resources, so there's no point in coming up. This way kubelet will continue to restart it until the connectivity to the API server is restored. I understand you run it outside of kubelet. What's your expected mode of operation?

onedr0p commented 2 years ago

I would hope it could start and warn about not reaching the cluster while still serving DNS for everything else in the config, and hopefully start working if the cluster did come online without restarting k8s_gateway.

networkop commented 2 years ago

I think it should be possible. What do you see happening now? Can you collect the logs with the debug enabled? https://coredns.io/plugins/debug/

onedr0p commented 2 years ago

This is the error I get on k8s_gateway startup, it's very easy to replicate.

[INFO] plugin/k8s_gateway: Building k8s_gateway controller
panic: Get "https://192.168.1.2:6443/apis/gateway.networking.k8s.io/v1alpha2/gateways": dial tcp 192.168.1.2:6443: connect: connection refused

goroutine 1 [running]:
github.com/ori-edge/k8s_gateway.handleCRDCheckError({0x22b3ce0, 0xc0007800f0}, {0x1eb2d04, 0xa}, {0x1ecdfc5, 0x19})
    /home/runner/work/k8s_gateway/k8s_gateway/kubernetes.go:208 +0x2b3
github.com/ori-edge/k8s_gateway.existGatewayCRDs({0x22cb590, 0xc00019c0c8}, 0x1eaea37?)
    /home/runner/work/k8s_gateway/k8s_gateway/kubernetes.go:190 +0xac
github.com/ori-edge/k8s_gateway.newKubeController({0x22cb590, 0xc00019c0c8}, 0xc000454300, 0xc0000936c0, 0xc00019f8d8)
    /home/runner/work/k8s_gateway/k8s_gateway/kubernetes.go:57 +0x14d
github.com/ori-edge/k8s_gateway.(*Gateway).RunKubeController(0xc0000fcc00, {0x22cb590, 0xc00019c0c8})
    /home/runner/work/k8s_gateway/k8s_gateway/kubernetes.go:180 +0xa5
github.com/ori-edge/k8s_gateway.setup(0x1eaa956?)
    /home/runner/work/k8s_gateway/k8s_gateway/setup.go:29 +0x10d
github.com/coredns/caddy.executeDirectives(0xc0003c5200, {0x7fffffffed0e, 0x1f}, {0xc000508c00, 0x30, 0x203000?}, {0xc000471a00, 0x2, 0x8?}, 0x0)
    /home/runner/go/pkg/mod/github.com/coredns/caddy@v1.1.1/caddy.go:661 +0x5d6
github.com/coredns/caddy.ValidateAndExecuteDirectives({0x22caa68?, 0xc0004719c0}, 0x8?, 0x0)
    /home/runner/go/pkg/mod/github.com/coredns/caddy@v1.1.1/caddy.go:612 +0x3e5
github.com/coredns/caddy.startWithListenerFds({0x22caa68, 0xc0004719c0}, 0xc0003c5200, 0x0)
    /home/runner/go/pkg/mod/github.com/coredns/caddy@v1.1.1/caddy.go:515 +0x26f
github.com/coredns/caddy.Start({0x22caa68, 0xc0004719c0})
    /home/runner/go/pkg/mod/github.com/coredns/caddy@v1.1.1/caddy.go:472 +0xe5
github.com/coredns/coredns/coremain.Run()
    /home/runner/go/pkg/mod/github.com/coredns/coredns@v1.9.1/coremain/run.go:63 +0x1cd
main.main()
    /home/runner/work/k8s_gateway/k8s_gateway/cmd/coredns.go:44 +0x95
networkop commented 2 years ago

thanks, this looks like a bug, I wasn't expecting this kind of behaviour. Anyhow, I'll try to cook something up over the weekend.

onedr0p commented 11 months ago

I don't know if this is an issue anymore, but in any case I no longer use k8s_gateway. Closing issue...