hetznercloud / hcloud-cloud-controller-manager

Kubernetes cloud-controller-manager for Hetzner Cloud
Apache License 2.0
706 stars 115 forks source link

Conflicting behavior between the Hetzner load balancer's hostname annotation and Node.js Neo4j driver's connection URL #485

Closed vinnytwice closed 10 months ago

vinnytwice commented 1 year ago

I'm creating a Kubernetes cluster on Hetzner Cloud with the same configuration I use on Azure AKS but I'm facing connection problems with Neo4j. On Hetzner cluster while I can access Neo4J browser from the path I defined in my Ingress, I can't connect to the Neo4j server using the bolt+s connection server.mydomain.com:7687 URL, nor does the Neo4j driver in my Node.js server pod (this second connection is kinda solved, see update at the end). This is not the case with the AKS cluster.

From Neo4j browser debbug connection I see that the Handshake fails:

Browser will attempt to open a websocket connection to bolt+s://server.mydomain.com:7687 and do an encrypted and an unencrypted bolt handshake.
bolt handshake
Status: 
Error
encrypted bolt handshake
Status: 
Error

From Chrome console I see 2 errors:

Mixed Content: The page at 'https://server.mydomain.com/neo4j/browser/' was loaded over HTTPS, but requested an insecure resource 'http://server.mydomain.com:7687/'. This request has been blocked; the content must be served over HTTPS.

WebSocket connection to 'wss://server.mydomain.com:7687/' failed:

The one difference between the two clusters is the ingress controller's Load Balancer configuration for which on Hetzner I set annotations in the ingress-nginx Helm chart as so:

nginx:
  controller:
    watchIngressWithoutClass: true
    kind: DaemonSet
    config:
      use-forwarded-headers: "true"
      compute-full-forwarded-for: "true"
      use-proxy-protocol: "true"
    service:
      annotations:
        load-balancer.hetzner.cloud/name: server-lb
        load-balancer.hetzner.cloud/use-private-ip: "true"
        load-balancer.hetzner.cloud/disable-private-ingress: "true"
        load-balancer.hetzner.cloud/location: fsn1
        load-balancer.hetzner.cloud/type: lb11
        load-balancer.hetzner.cloud/uses-proxyprotocol: "true"
        load-balancer.hetzner.cloud/http-redirect-https: "true"
        load-balancer.hetzner.cloud/hostname: server.mydomain.com
        # nginx.ingress.kubernetes.io/websocket-services: neo4j

    extraArgs:
      default-ssl-certificate: "default/tls-secret"  

    # nodeSelector:
    #   server-type: server  
  tcp:
    7687: "default/neo4j:7687" 
    7474: "default/neo4j:7474"

AFAIK ingress-nginx controller (which I'm using) handles WebSockets automatically unlike nginx-ingress for which should be mapped to a service using an annotation like nginx.ingress.kubernetes.io/websocket-services: neo4j, I tried using the annotation anyways but didn't make a difference.

The complete procedure I used for the Hetzner cluster is: I created a Kubernetes a single node cluster on Hetzner Cloud using k3s v1.27.4+k3s1, installed ingress-nginx v4.7.1 exposing TCP ports 7474 and 7687 to Neo4j service as you can see above (the Load Balancer TCP ports are exposed and healthy) and Cert-manager v1.12.3 Helm charts.

In my domain DNS manager I created an A record pointing to the load balancer IPv4 with host set as sever to use it in my Certificate and Ingress manifests as server.mydomain.com. The tls-secret gets created correctly.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ingress-service
  annotations:
    nginx.ingress.kubernetes.io/use-regex: 'true'
    nginx.ingress.kubernetes.io/rewrite-target: /$2$3$4
    ingress.kubernetes.io/ssl-redirect: 'true'
    nginx.ingress.kubernetes/cluster-issuer: letsencrypt-issuer

spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - server.mydomain.com
      secretName: tls-secret
  rules:

    ### Node.js server
    - http:
        paths:
          - path: /(/|$)(.*)
            # pathType: Prefix
            pathType: ImplementationSpecific
            backend:
              service:
                name: server-clusterip-service
                port:
                  number: 80
    - http:
        paths:
          - path: /server(/|$)(.*)
            # pathType: Prefix
            pathType: ImplementationSpecific
            backend:
              service:
                name: server-clusterip-service
                port:
                  number: 80

    ##### Neo4j

    - http:
        paths:
          - path: /bolt(/|$)(.*)
            # pathType: Prefix
            pathType: ImplementationSpecific
            backend:
              service:
                name: neo4j
                port:
                  number: 7687
    - http:
        paths:
          # show browser
          - path: /neo4j(/|$)(.*)
            # pathType: Prefix
            pathType: ImplementationSpecific
            backend:
              service:
                name: neo4j
                port:
                  number: 7474
    - http:
        paths:
          - path: /neo4j-admin(/|$)(.*)
            # pathType: Prefix
            pathType: ImplementationSpecific
            backend:
              service:
                name: neo4j-admin
                port:
                  number: 7474

To install Neo4j chart I'm setting these values for Neo4j configuration:

  config:
    server.bolt.enabled: 'true'
    server.bolt.tls_level: 'REQUIRED'
    server.bolt.listen_address: '0.0.0.0:7687'
    dbms.ssl.policy.bolt.client_auth: 'NONE'
    dbms.ssl.policy.bolt.enabled: 'true'

    # dbms.connector.bolt.advertised_address: '0.0.0.0:7687' #server.mydomain.com:7687 # new for hetzner (no connection still)

    ## apoc
    server.directories.plugins: '/var/lib/neo4j/labs'
    dbms.security.procedures.unrestricted: 'apoc.*'
    server.config.strict_validation.enabled: 'false'
    dbms.security.procedures.allowlist: 'gds.*,apoc.*'

    ### apoc config
    dbms.directories.plugins: "/var/lib/neo4j/labs"
    dbms.config.strict_validation: "false"

  apoc_config:
    apoc.trigger.enabled: "true"
    apoc.jdbc.neo4j.url: "jdbc:foo:bar"
    apoc.import.file.enabled: "true"

  startupProbe:
    failureThreshold: 1000
    periodSeconds: 50

  ssl:
    # setting per "connector" matching neo4j config
    bolt:
      privateKey:
        secretName: tls-secret
        subPath: tls.key
      publicCertificate:
        secretName: tls-secret
        subPath: tls.crt
      trustedCerts:
        sources: []
      revokedCerts:
        sources: []

I tried setting the dbms.connector.bolt.advertised_address(dough on Azure is not set) using both the any IP 0.0.0.0:7687 value and the specific dns server.mydomain.com:7687value but that didn't make a difference either. On the Hetzner Firewall rules I created rules for ports 80(http) and 443 (https) to allow to port 7474 and 7687. I also tried disabling the Firewall as a test but still can't reach Neo4j server.

I noticed that the nginx-ingress-controller External IP onAzure was actually showing the IPv4 address from the load balancer, while on Hetzner it was showing the dns name server.mydomain.com so I removed the load-balancer.hetzner.cloud/hostname: server.mydomain.com annotation from ingress-nginx service annotations helm chart and without it the Neo4j driver in my Node.js server pod succeeds in connecting to Neo4j.

Unfortunately I still get the two errors when connecting from the Neo4j Browser app in the web browser:

Mixed Content: The page at 'https://server.mydomain.com/neo4j/browser/' was loaded over HTTPS, but requested an insecure resource 'http://server.mydomain.com:7687/'. This request has been blocked; the content must be served over HTTPS.

WebSocket connection to 'wss://server.mydomain.com:7687/' failed:

I started a fresh server, and while issuing the Let'sEncrypt certificate, if I don't use the annotation load-balancer.hetzner.cloud/hostname: server.mydomain.com, Certificate issuance hangs, while with it completes as expected.

I'm completely going in circles here.. Can you spot some other configuration I need to add or change for this setup? Many thanks

github-actions[bot] commented 11 months ago

This issue has been marked as stale because it has not had recent activity. The bot will close the issue if no further action occurs.

apricote commented 11 months ago

Hey @vinnytwice,

we do not provide support for your own software you are running on our servers. If you can point out where exactly hcloud-cloud-controller-manager is not setting up the Load Balancer as you would expect, I will gladly help you figure out the annotations you need instead.

vinnytwice commented 10 months ago

@apricote I sorted it out. The guys at Neo4j told me that my problem could have to do with your load balancer or network, which may differ from Azure's, but meantime they released a reverse proxy that I'll use to solve this issue. Thank you very much again. Cheers