Closed sevaho closed 4 years ago
I think that the issue might that the client is trying to reconnect to internal IPs behind the load balancer and then failing to reconnect... You can confirm this by making a telnet / nc connection against the load balancer IP:
nc <my-external-ip> 4222
Then as part of the INFO
protocol response you would see some connect_urls
containing the other ips of the cluster that the clients will be using to reconnect which may contain internal ips not from the load balancer. For LB setups it is recommended to set no_advertise: true
but just noticed that it is not exposed in the Helm charts: https://github.com/nats-io/k8s/issues/81
Hi @sevaho I could reproduce now. I will make a release of the Helm charts here with the no_advertise
fix here: https://github.com/nats-io/k8s/pull/128
Although what seems to be going actually is that the verbose=True
flag seems to be incompatible with tls in the nats.py client, so I'm suggesting the following changes:
Remove verbose since this is a setting mostly used for debugging connectivity using simple clients like telnet
or netcat
. All clients have it disabled by default since only introduces overhead into the protocol (each command will have an +OK which is not useful for the client library).
Also ping_interval
is a bit short to send pings every 5 seconds I think, the default is to send 1 ping every 2 minutes by a client. This depends on the use case of course but recommend maybe leaving the default:
await self.nc.connect(
HOST_NATS,
allow_reconnect=True,
tls=context,
user_credentials=NATS_CREDENTIALS_FILE,
error_cb=self.error_cb,
closed_cb=self.closed_cb,
reconnected_cb=self.reconnected_cb,
- verbose=True,
- ping_interval=5,
)
limits:
maxConnections:
maxSubscriptions:
- maxControlLine: 512
- maxPayload: "1000000000"
- advertise: true
+ advertise: false
I created an LB in DigitalOcean as below that also has TLS setup in the server and connectivity with the Python client seems fine:
apiVersion: v1
kind: Service
metadata:
name: nats-lb
spec:
type: LoadBalancer
selector:
app: nats
ports:
- protocol: TCP
port: 4222
targetPort: 4222
name: nats
- protocol: TCP
port: 7422
targetPort: 7422
name: leafnodes
- protocol: TCP
port: 7522
targetPort: 7522
name: gateways
noAdvertise
to avoid leaking internal ips since an LB is being used:cluster:
enabled: true
noAdvertise: true
Hi @sevaho, disabling Verbose in the client and deploying the Helm chart v0.5.6 should help out with this issue: https://github.com/nats-io/k8s/releases/tag/v0.5.6 cheers :)
@wallyqs thank you very much, I've upgraded the server and updated the clients. I'll wait until end of week to close this issue if you don't mind just to be sure it works as expected.
greetings
sevaho
thanks @sevaho sounds good :)
Hi @wallyqs I don't have the problem anymore so it looks like it worked :).
Hi
I am trying to understand why my nats setup is not working like it should.
Setup
NATS server
I am using the helm chart with image: nats:2.1.7-alpine3.11
values.yaml:
NATS is hosted on DigitalOcean and can be accessed via a LoadBalancer.
Nats client
requirements.txt
Problem
The problem I am having is that nats clients are not reconnecting properly and outputting these errors.
Is there something I do wrong here?
Greetings
Sebastiaan