travisghansen / kubernetes-pfsense-controller

Integrate Kubernetes and pfSense
Apache License 2.0
197 stars 22 forks source link

pfsense getting constant updates #17

Closed hansaya closed 1 year ago

hansaya commented 2 years ago

Haproxy on pfsense keep getting reloaded, leading haproxy not being able to hold a connection.

As you can see from the logs bellow. Every second or so it goes and updates pfsense. What could I be doing wrong?

2022-02-23T22:40:28+00:00 plugin (haproxy-declarative): successfully reloaded HAProxy service
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071070
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071071
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071096
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070722
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070723
2022-02-23T22:40:28+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070726
2022-02-23T22:40:31+00:00 plugin (haproxy-declarative): successfully reloaded HAProxy service
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071097
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071129
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071130
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070728
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070731
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070755
2022-02-23T22:40:31+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070756
2022-02-23T22:40:33+00:00 plugin (haproxy-declarative): successfully reloaded HAProxy service
2022-02-23T22:40:33+00:00 plugin (haproxy-ingress-proxy): successfully reloaded HAProxy service
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071153
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071155
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-services): /v1/namespaces/kube-system/Service/traefik MODIFIED - 8071192
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070757
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/test/Ingress/mysite-ingress MODIFIED - 8070758
2022-02-23T22:40:33+00:00 plugin (pfsense-dns-haproxy-ingress-proxy): /networking.k8s.io/v1/namespaces/cattle-system/Ingress/rancher MODIFIED - 8070760

I have a "simple" setup currently with a rancher service and one test service. I'm running k3s v1.22.3+k3s1 with 3 servers and 3 agents in a HA config. For HA I'm using kube-vip and then using metallb for service load balancing. Finally traefik for ingress. Currently haproxy on pfsense doing certification management and SSL offloading. This issue seems to caused by k3s thinking there is a change then triggering this project to go update pfsense.

Let me know if you need more details about my setup.

travisghansen commented 2 years ago

Welcome! Something is triggering constantly updates on the ingress and service which is pretty abnormal. I'm not familiar with kube-vip but based on a quick look my guess is that kube-vip and metallb are 'fighting' each other over who 'owns' the LoadBalancer IP address.

I would suggest running something like kubectl get svc -A | grep LoadBalancer under watch (or something like this kubectl get svc -A --watch) and see if the IP is flapping constantly.

hansaya commented 2 years ago

You are correct, I miss read the documentation https://kube-vip.chipzoller.dev/docs/installation/daemonset/ After removing metallb, no more DDOSing pfsense. However, I might have configured something wrong. k3s not picking up the VIP and this project one of the control node as the entry point. This defeats use of VIP for redundancy. I might mess around with turning off service option for kube-vip and try mteallb again on top kube-vip.

travisghansen commented 2 years ago

Sounds good. Let me know how it goes. We’ll leave this open until we know everything is good.

hansaya commented 2 years ago

Update: I played with this lot more and removed kube-vip altogether from the equation. Looks like traefik causing Metallb handout a ip constantly. I haven't figure out why yet.

{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.060373128Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.066429678Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.06652935Z"}
{"caller":"level.go:63","error":"Operation cannot be fulfilled on services \"traefik\": the object has been modified; please apply your changes to the latest version and try again","level":"error","msg":"failed to update service status","op":"updateServiceStatus","service":"kube-system/traefik","ts":"2022-02-25T16:05:23.071830824Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:25.038477461Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:25.050305976Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:27.038608675Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:27.045857899Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:29.039585392Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:29.05151219Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:31.036852784Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:31.048244625Z"}
{"caller":"level.go:63","event":"ipAllocated","ip":"172.16.2.30","level":"info","msg":"IP address assigned by controller","service":"kube-system/traefik","ts":"2022-02-25T16:05:33.045624429Z"}
{"caller":"level.go:63","event":"serviceUpdated","level":"info","msg":"updated service object","service":"kube-system/traefik","ts":"2022-02-25T16:05:33.057657115Z"} 

What about using kube-vip for service load balancing as well? I managed to get this working but for whatever reason kube-vip not updating k3s about the ip address it handed out so I have to manually read the logs and update pfsense.

travisghansen commented 2 years ago

Akube-vip plugin likely would not be very difficult to add. If you get it up and going and working as expected I'm happy to take a look.

hansaya commented 2 years ago

Found this https://rancher.com/docs/k3s/latest/en/networking/#disabling-the-service-lb so that was my issue and after I disabled servicelb everything worked as expected. Now time to dig into kube-vip

travisghansen commented 2 years ago

Ah! So you had 3 things fighting over the ip…that’s a disaster for sure. Are you going to reinstall metallb now or stick with just kube-vip?

hansaya commented 2 years ago

Thanks for the help, I did mess around with kube-vip more but I cannot get it to play well. I'm going to leave it to that and use metallb. One last question, I have not seen any configuration option to use two or more shared frontends. I got two shared frontends configured in haproxy for WAN and LAN side. This helps me to do proper SSL without needing to expose all of the services to public. Any suggestions for this without running two instances of this project?

travisghansen commented 2 years ago

Give me a bit more detail on the setup and desired outcome if you don’t mind and I’ll see if it’s possible.

hansaya commented 2 years ago

Sure, I got two frontends binded to two interfaces. WAN and LAN. This helps with separating rules for internal and public services.

similar to this

frontend shared-https-merged
    bind            xxx.xxx.xxx.131:443 namexxx.xxx.xxx.131:443   ssl crt-list /var/etc/haproxy/shared-https.crt_list  
    mode            http
    log         global
    option          httpclose
    timeout client      30000
    rspidel ^Server:.*$
    acl         aclcrt_shared-https var(txn.txnhost) -m reg -i ^([^\.]*)\.example\.com(:([0-9]){1,5})?$
    acl         ACL1    var(txn.txnhost) -m str -i nextcloud.example.com
    acl         ACL11   var(txn.txnhost) -m str -i pass.example.com
    acl         ACL20   var(txn.txnhost) -m str -i ha.example.com
    use_backend nextcloud.example.com_ipvANY  if  ACL1 
    use_backend pass.example.com_ipvANY  if  ACL11 
    use_backend ha.example.com_ipvANY  if  ACL20 
    use_backend bad_backend_ipvANY  if   aclcrt_shared-https

frontend shared-https-local-merged
    bind            172.16.1.1:443 name 172.16.1.1:443   ssl crt-list /var/etc/haproxy/shared-https-local.crt_list  
    mode            http
    log         global
    option          httpclose
    option          forwardfor
    acl https ssl_fc
    http-request set-header     X-Forwarded-Proto http if !https
    http-request set-header     X-Forwarded-Proto https if https
    timeout client      30000
    rspidel ^Server:.*$
    acl         aclcrt_shared-https-local   var(txn.txnhost) -m reg -i ^([^\.]*)\.example\.com(:([0-9]){1,5})?$
    acl         ACL4    var(txn.txnhost) -m str -i home.example.com
    acl         ACL5    var(txn.txnhost) -m str -i unifi.example.com
    acl         ACL2    var(txn.txnhost) -m beg -i nextcloud.example.com
    acl         ACL14   var(txn.txnhost) -m str -i bi.example.com
    acl         ACL15   var(txn.txnhost) -m str -i primary.example.com
    acl         ACL16   var(txn.txnhost) -m str -i secondary.example.com
    acl         ACL17   var(txn.txnhost) -m str -i plex.example.com
    acl         ACL17   var(txn.txnhost) -m str -i plex.direct
    acl         ACL18   var(txn.txnhost) -m str -i syno.example.com
    acl         ACL19   var(txn.txnhost) -m str -i ha.example.com
    acl         ACLGRAFANA  var(txn.txnhost) -m str -i grafana.example.com
    acl         ACLESPHOME  var(txn.txnhost) -m str -i esphome.example.com
    acl         PASS_LOCAL_ACL  var(txn.txnhost) -m str -i pass.example.com
    use_backend home.example.com_ipvANY  if  ACL4 
    use_backend unifi.example.com_ipvANY  if  ACL5 
    use_backend nextcloud.example.com_ipvANY  if  ACL2 
    use_backend bi.example.com_ipvANY  if  ACL14 
    use_backend primary.example.com_ipvANY  if  ACL15 
    use_backend secondary.example.com_ipvANY  if  ACL16 
    use_backend plex.example.com_ipvANY  if  ACL17 
    use_backend synology.example.com_ipvANY  if  ACL18 
    use_backend ha.example.com_ipvANY  if  ACL19 
    use_backend grafana.example.com_ipvANY  if  ACLGRAFANA 
    use_backend esphome.example.com_ipvANY  if  ACLESPHOME 
    use_backend pass.example.com_ipvANY  if  PASS_LOCAL_ACL 
    use_backend bad_backend_ipvANY  if   aclcrt_shared-https-local

I do not want everything to be exposed to public, specially projects I'm currently working on. However it would be nice to have them work with proper SSL certificates. So I just need that particular project to be using the shared-https-local.

travisghansen commented 2 years ago

I'm guessing you're referring to the haproxy-ingress-proxy feature. If so this bit from the README is probably what you're after:

Optionally, on the ingress resources you can set the following annotations: haproxy-ingress-proxy.pfsense.org/frontend and haproxy-ingress-proxy.pfsense.org/backend to respectively set the frontend and backend to override the defaults.

hansaya commented 2 years ago

Thats not going to solve the issue. it replaces the default. As you can see, my haproxy has duplicate entries for Local and Public. Why? This allows me to do different rules for local vs public. On top of that my public dns points to cloudflare and they have limits to bandwidth single file size limits etc... Having local traffic directly going to proxy entry point bypasses all of that.

travisghansen commented 2 years ago

You want the same ingress to be added to 2 frontends?

hansaya commented 2 years ago

Yes

travisghansen commented 1 year ago

I have this implemented. Anything else needed?

hansaya commented 1 year ago

How you go about doing it? Setting a default and setting a annotation(haproxy-ingress-proxy.pfsense.org/frontend) at the same time?

travisghansen commented 1 year ago

I haven't committed it yet, but it just supports a comma-separated list instead of a single entry.

hansaya commented 1 year ago

Thank you so much. Let me know if you want me to test it.

travisghansen commented 1 year ago

Released in v0.5.12.

travisghansen commented 1 year ago

Any luck testing this out?

hansaya commented 1 year ago

I just tested it, works as expected. Thank you so much!

ashtonian commented 1 year ago

hey just peeping this convo, I was wondering were you able to get this to work with kube-vip without metallb? should it just work with this controller- or is some additional integration required?

travisghansen commented 1 year ago

Let’s open another issue for kube-vip support. Currently I think the only real dependency on metallb is it checks for the configmap (which is arbitrary, it doesn’t use any data from it). Some minor adjustments can be made and it should work fine with kube-vip as well.

travisghansen commented 1 year ago

v0.5.13 has removed any need for metallb at all. The plugin is still named metallb (for now) but it simply manages bgp peer by pushing cluster nodes to pfSense.