Open JensRantil opened 4 years ago
Also, should a TCP ping (from a load balancer) really be an ERR
? In my case, I'd love to suppress the message as it really doesn't require any action from me.
The load balancer is most likely a layer 7 router with health checks. These are protocol aware and will not work with the NATS protocol.
If you run the server temporarily with the -DV flag you will see more information.
Oh thanks for pointing out it could be the load balancer, that's exactly my issue as well. I removed the target groups and no more issues. Not sure how to fix this, or if there is an alternative to expose it since Nats is running as containers in my private kubernetes cluster.
If you can configure the load balancer for its health checks, run the nats-server with monitoring turned on and have the health check hit that http endpoint instead.
https://docs.nats.io/nats-server/configuration/monitoring#enabling-monitoring-from-the-command-line
These can be plain HTTP or TLS, up to you.
But again, in general you want to use host port and allow direct access to the NATS servers and avoid ingress proxies etc.
Thanks for the quick reply.
So you would suggest running a load balancer layer 7 (aws alb) instead of layer 4 (aws nlb) ? Because I read somewhere else that Nats had issues with Layer 7 and it would be preferable to run with a layer 4 (https://github.com/nats-io/nats-server/issues/291). But from what I know, layer 4 Healthchecks cannot hit on specific route (what you suggest).
I feel like Nats has been developped with the idea to deploy on auto-scaling-group in mind, and not really targeting private Kubernetes cluster hosting.
I still have time to switch back to that route and discover bosh cli, if you think it would be less burdensome. But if plenty of people have success running it as container in private cluster, I might as well just continue that way.
Just wondering what's your thought on this.
(Edit: Sorry if that's a bit far from the original thread subject here, I think somehow that it's kinda related anyway hehe )
I would not put any LB in between NATS clients and servers. Would just use DNS with multiple A records or a list of servers in the client. NATS handles all that stuff for you and better then the LBs do.
NATS protocol was designed ~10yrs ago, way before k8s was on the scene ;)
@wallyqs may have some helpful hints too.
So deploying Nats on dedicated VMs would be the way to go, as I don't plan to expose my Kubernetes nodes publicly. Just installed Bosh CLI, we'll see how it goes :)
We install NATS servers in k8s all the time, we just allow direct access via host port config and avoid clients going through the ingress controller or any other proxy/LB.
@wallyqs can show you some more details.
@JnMik currently the most reliable way to deploy NATS on K8S right now would be to use host ports and then expose the public ip from the Kubelets for external access. Then the external-dns component can be helpful to manage registering dynamically the public ips to the DNS records based on the headless service that represents the NATS Server nodes.
Hello @wallyqs
Ok I'll lookup the external-dns and the kubeletes public ip thing see if I can figure it out. Thanks !
Hi @wallyqs
When you use the external-dns
in this scenario, do you hook it directly to the nats
server, or do you need some kind of nginx-ingress-controller
between the nats
service and the external-dns
?
(i.e. this examples: https://github.com/kubernetes-sigs/external-dns/blob/master/docs/tutorials/public-private-route53.md , with then the nginx-ingress
pointing to NATS
?)
Considering a Kubernetes cluster where all nodes have private ip, is there really a way to expose a service with a public ip without using type=LoadBalancer ? I tried some stuff and it feels like no.
The ingress-controller seems like an alternative to the AWS LB Layer 4, so maybe we will be able to specify a different healthcheck in the ingress controller that won't pollute the logs. But the ingress controller will be exposed via a LoadBalancer anyway.
External-DNS could then update the DNS record using the ingress controller load balancer IP.
That's the only way I can think.
I tried exposing the nats monitor port like this (an attempt to have a public ip directly on a service, in a private kubernetes cluster), but it's just unreachable.
resource "kubernetes_service" "nats-expose-monitor-public" {
metadata {
name = "nats-expose-monitor-public"
namespace = "default"
labels = {
app = "nats"
}
}
spec {
selector = {
env = var.env
app = "nats"
}
external_ips = [
aws_eip.nats-0-ip.public_ip
]
port {
protocol = "TCP"
port = <some-port>
target_port = 8222
}
}
}
resource "aws_eip" "nats-0-ip" {
vpc = true
tags = {Name = "bla bla bla"}
}
kubectl get service
nats-expose-monitor-public ClusterIP <private-ip> <public-ip> <some-port>/TCP
If I release some public Kubernetes nodes (public subnets) inside my cluster, and assign them ips, my cluster gets hybrid private/public and I can reach the "
hey @wallyqs, now that I think of it, when using Kubernetes Service Node Port, it exposes the port on all nodes and the request to nats is already randomly balanced in the pods of the statefulset. So having a load balancer in front of it doesn't really change a thing right ?
Edit: I refer to this image only to proove my point that the node port is actually balanced between nodes, but VMs should be inside the kubernetes, the drawing is weirdly done)
@JBHarvey we need docs documenting external-dns and this approach (working on that...) but basically when you create a cluster in AWS with eksctl
for example, it does create nodes that have an available public ip by default for example:
# Create 3 nodes Kubernetes cluster
eksctl create cluster --name nats-k8s-cluster \
--nodes 3 \
--node-type=t3.large \
--region=eu-west-1
# Get the credentials for your cluster
eksctl utils write-kubeconfig --name $YOUR_EKS_NAME --region eu-west-1
After that is done you get a set of 3 nodes with the example above:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-192-168-10-213.us-east-2.compute.internal Ready <none> 124d v1.12.7 192.168.10.213 3.17.184.16 Amazon Linux 2 4.14.123-111.109.amzn2.x86_64 docker://18.6.1
ip-192-168-45-209.us-east-2.compute.internal Ready <none> 124d v1.12.7 192.168.45.209 18.218.52.122 Amazon Linux 2 4.14.123-111.109.amzn2.x86_64 docker://18.6.1
ip-192-168-65-15.us-east-2.compute.internal Ready <none> 124d v1.12.7 192.168.65.15 3.15.38.138 Amazon Linux 2 4.14.123-111.109.amzn2.x86_64 docker://18.6.1
Then you can deploy NATS and create a headless service named nats
which will represent the NATS Server nodes:
kubectl get svc nats -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
nats ClusterIP None <none> 4222/TCP,6222/TCP,8222/TCP,7777/TCP,7422/TCP,7522/TCP 36d app=nats
Once deploying external-dns
, you do currently have to use a NodePort
with something as follows to keep the nodes mapped by the external dns with the ones from the headless service:
apiVersion: v1
kind: Service
metadata:
name: nats-nodeport
labels:
app: nats
annotations:
external-dns.alpha.kubernetes.io/hostname: nats.example.com
spec:
type: NodePort
selector:
app: nats
externalTrafficPolicy: Local
ports:
- name: client
port: 4222
nodePort: 30222 # Arbitrary port to represent the external dns service, external-dns issue...
targetPort: 4222 # NOTE: the NATS pods also use host ports
The external-dns
process would be responsible of registering the public ips from the nodes to be serviced at nats.example.com
.
@JnMik that's right another way to go would be to use a NodePort
and have K8S do the load balancing, I think this goes through iptables rules from K8S so just have to keep that in mind, but this would workaround the limitations of using a load balancer ingress for NATS which basically prevents being able to use TLS connections and affects performance as well.
Besides having to use a high port from the nodeport range, one other inconvenience of using a NodePort is that the client advertisements from NATS would be that of the internal ip addresses from the K8S network, so if a client connects externally from the nodeport when trying to reconnect they would try to reconnect to a private ip. To workaround that issue you could either disable advertisements (--no-advertise
flag), and then let the clients reconnect to the public ip from the nodeport using the high port from the nodeport range.
So we tried external-dns this afternoon. I gotta say first, it was on a hybrid eks cluster (some nodes in private subnet, some nodes in public subnet to host nats containers).
When using a headless service, external-dns was creating 3 records in our route53, nats-1.xxx.com nats-2.xxx.com nats-3.xxx.com However the A record was pointing to the nodes private IPs! So it wasn't working.
If I turn the service into a NodePort Service, External-dns creates 1 record only, nats.xxx.com and the 3 publics IPs are in the A record.
Do you think nats will have any issues if we use 1 record with 3 IPs in it balanced ?
thanks @wallyqs for your time
I'm thinking moving the nats container to a separate cluster with only node with private IPs, instead of having a hybrid one.
I suppose the headless service would then generate 3 records pointing to public IPs (hopefully?)
Could be an idea.
Do you think nats will have any issues if we use 1 record with 3 IPs in it balanced ?
Thanks for sharing @JnMik, I don't see an issue with this setup as long as the ExternalIP metadata is present in the Kubernetes cluster (that is if both INTERNAL IP
and EXTERNAL IP
are displayed when executing kubectl get nodes -o wide
). If the external ip metadata is in the node, then the servers will be able to advertiser the other alive public ips that are part of the cluster and use that for reconnecting and failover right away and avoid the extra DNS lookup. The NATS clients also get a list of the ips when connecting and pick one randomly so clients should be distributed evenly as well.
We do something similar with the connect.ngs.global
service that Synadia offers, for example the nodes available in the hostname uswest2.aws.ngs.global
are right now for me:
dig uswest2.aws.ngs.global
...
;; ANSWER SECTION:
uswest2.aws.ngs.global. 60 IN A 54.202.186.240
uswest2.aws.ngs.global. 60 IN A 35.166.100.73
uswest2.aws.ngs.global. 60 IN A 44.228.141.181
And if I nc
or telnet
against the client port I get the rest of the cluster members:
telnet uswest2.aws.ngs.global 4222
INFO {...,"cluster":"aws-uswest2","connect_urls":["35.166.100.73:4222","44.228.141.181:4222","54.202.186.240:4222"]}
In order to enable this advertisements, we use the following initializer container that has some extra Kubernetes policy to be able to lookup what is the public ip of the Kubelet where it is running: https://github.com/nats-io/k8s/blob/master/nats-server/nats-server-with-auth-and-tls.yml#L132-L153 And have the server load that file from the config via an empty directory volume: https://github.com/nats-io/k8s/blob/master/nats-server/nats-server-with-auth-and-tls.yml#L54
Hey @wallyqs
I managed to have the "advertise connect_urls" working, with the initContainer and all the stuff regarding advertise/advertiseconf. Great help ! Have a nice day.
We built a tool called casper-3 at Gather Inc. to handle DNS registration for applications running in hostMode on specific node pools. Supports registration for pods, mostly used with sts, and nodes, mostly used with deployments. The tool is tailored around our needs - we're a rather small team - but it is open source and fairly straight forward if you're familiar with golang. Supports CloudFlare and DigitalOcean as DNS providers, but adding Route53 (or what you have) shouldn't be too complicated.
I find it quite strange that the recommended way to run NATS in kubernetes requires public kubernetes nodes, which goes against the recommendation of cloud providers due to the security implications of having public instances. We are using a layer 4 NLB with AWS EKS to expose our NATS cluster, and I haven't yet found a good way to prevent the TLS handshake errors from the tcp health checks.
@dan-connektica NATS does not require security perimeter models, or load balancers to work properly and securely. Remember NATS can run anywhere, not just in a cloud provider, specifically out at the edge.
That being said, setting up health checks should be fairly straightforward and not really specific to a NATS system. The health check needs to be TLS aware, and if you are forcing client side certs would need those as well.
@dan-connektica you should be able to customize the probe follows to avoid those errors, for example:
spec:
type: LoadBalancer
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-name: nats-nlb
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol: http
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "8222"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: "/"
@wallyqs Thanks, adding the monitoring port as the healthcheck port did the trick!
Defects
I'm running a NATS service 2.1.2 I am using TLS between my NATS brokers (port 6222) as well as TLS from clients to brokers (port 4222). Additionally I have added a
http: localhost:8222
config stanza to get metrics.My NATS servers are outputting
I'm pretty sure I have identified the culprit as being a load balancer pinging my NATS instances (on port 4222). However, it would be very helpful if the error logs could say which port the TLS handshake fails on to be able to debug errors like this a lot easier. See this as a feature request.