nats-io / nats.py

Python3 client for NATS
https://nats-io.github.io/nats.py/
Apache License 2.0
888 stars 191 forks source link

nats.aio.errors.NatsError: nats: empty response from server when expecting INFO message #179

Closed sevaho closed 4 years ago

sevaho commented 4 years ago

Hi

I am trying to understand why my nats setup is not working like it should.

Setup

NATS server

I am using the helm chart with image: nats:2.1.7-alpine3.11

values.yaml:

nats:
  image: nats:2.1.7-alpine3.11
  pullPolicy: IfNotPresent

  externalAccess: false
  advertise: true

  serviceAccount: "nats-server"

  connectRetries: 30

  pingInterval:

  limits:
    maxConnections:
    maxSubscriptions:
    maxControlLine: 512
    maxPayload: "1000000000"

    writeDeadline: "2s"
    maxPending:
    maxPings:
    lameDuckDuration:

  logging:
    debug: true
    trace: true
    logtime:
    connectErrorReports:
    reconnectErrorReports:

  tls:
    url: <DUMMYDATA>
    issuerName: letsencrypt-prod
    secret:
      name: nats-client-tls
    cert: "tls.crt"
    key: "tls.key"

nameOverride: ""
imagePullSecrets: []

securityContext: null
affinity: {}
podAnnotations: {}

nodeSelector:
  doks.digitalocean.com/node-pool: staging

cluster:
  enabled: true
  replicas: 3

leafnodes:
  enabled: false

gateway:
  enabled: false
  name: 'default'

bootconfig:
  image: connecteverything/nats-boot-config:0.5.2
  pullPolicy: IfNotPresent

reloader:
  enabled: true
  image: connecteverything/nats-server-config-reloader:0.6.0
  pullPolicy: IfNotPresent

exporter:
  enabled: true
  image: synadia/prometheus-nats-exporter:0.5.0
  pullPolicy: IfNotPresent

auth:
  enabled: true

  operatorjwt:
    configMap:
      name: operator-jwt-nats
      key: operator.jwt
      data: "DUMMYDATA"

  systemAccount: DUMMYDATA

  resolver:
    type: URL
    type: URL
    url: "http://nats-account-server:9090/jwt/v1/accounts/"

NATS is hosted on DigitalOcean and can be accessed via a LoadBalancer.

Nats client

requirements.txt

asyncio-nats-client[nkeys]==0.11.2
fastapi==0.61.0
#! /usr/bin/python

import asyncio
import logging
import ssl
from typing import Any

from fastapi import FastAPI
from nats.aio.client import Client as NATS
from pydantic import BaseModel

HOST_NATS = ""
NATS_CREDENTIALS_FILE = ""

class NatsClient(BaseModel):
    nc: Any = None

    @property
    def connected(self) -> bool:
        return self.nc.is_connected

    async def error_cb(self, e):
        logging.error("NATS Error:", e)

    async def closed_cb(self):
        logging.error("NATS CLOSED")

    async def reconnected_cb(self):
        logging.error("NATS RECONNECTED")

    async def setup(self) -> None:
        self.nc: NATS = NATS()

        context = ssl.create_default_context()
        await self.nc.connect(
            HOST_NATS,
            allow_reconnect=True,
            tls=context,
            user_credentials=NATS_CREDENTIALS_FILE,
            error_cb=self.error_cb,
            closed_cb=self.closed_cb,
            reconnected_cb=self.reconnected_cb,
            verbose=True,
            ping_interval=5,
        )

    async def teardown(self):

        await self.nc.close()

def create_app() -> "FastAPI":
    app = FastAPI()

    app.state.nats_client = NatsClient()

    @app.on_event("startup")
    async def setup() -> None:
        app.state.loop = asyncio.get_event_loop()
        await app.state.nats_client.setup()

    @app.on_event("shutdown")
    async def teardown() -> None:
        await app.state.nats_client.teardown()

app = create_app()

Problem

The problem I am having is that nats clients are not reconnecting properly and outputting these errors.

root 2020-09-26 13:31:43 ERROR (__init__-error_cb:23) >> NATS ERROR: 
root 2020-09-26 13:31:43 ERROR (__init__-error_cb:23) >> NATS ERROR: 
root 2020-09-26 13:31:43 ERROR (__init__-error_cb:23) >> NATS ERROR: nats: empty response from server when expecting INFO message
root 2020-09-26 13:31:43 ERROR (__init__-error_cb:23) >> NATS ERROR: nats: empty response from server when expecting INFO message
root 2020-09-26 13:31:43 ERROR (__init__-closed_cb:26) >> NATS CLOSED
root 2020-09-26 13:31:43 ERROR (__init__-closed_cb:26) >> NATS CLOSED

Is there something I do wrong here?

Greetings

Sebastiaan

wallyqs commented 4 years ago

I think that the issue might that the client is trying to reconnect to internal IPs behind the load balancer and then failing to reconnect... You can confirm this by making a telnet / nc connection against the load balancer IP:

nc <my-external-ip> 4222

Then as part of the INFO protocol response you would see some connect_urls containing the other ips of the cluster that the clients will be using to reconnect which may contain internal ips not from the load balancer. For LB setups it is recommended to set no_advertise: true but just noticed that it is not exposed in the Helm charts: https://github.com/nats-io/k8s/issues/81

wallyqs commented 4 years ago

Hi @sevaho I could reproduce now. I will make a release of the Helm charts here with the no_advertise fix here: https://github.com/nats-io/k8s/pull/128 Although what seems to be going actually is that the verbose=True flag seems to be incompatible with tls in the nats.py client, so I'm suggesting the following changes:

Python connect options

        await self.nc.connect(
            HOST_NATS,
            allow_reconnect=True,
            tls=context,
            user_credentials=NATS_CREDENTIALS_FILE,
            error_cb=self.error_cb,
            closed_cb=self.closed_cb,
            reconnected_cb=self.reconnected_cb,
-            verbose=True,
-            ping_interval=5,
        )

Helm Chart

  limits:
    maxConnections:
    maxSubscriptions:
-  maxControlLine: 512
-  maxPayload: "1000000000"
-  advertise: true
+ advertise: false

I created an LB in DigitalOcean as below that also has TLS setup in the server and connectivity with the Python client seems fine:

apiVersion: v1
kind: Service
metadata:
  name: nats-lb
spec:
  type: LoadBalancer
  selector:
    app: nats
  ports:
    - protocol: TCP
      port: 4222
      targetPort: 4222
      name: nats
    - protocol: TCP
      port: 7422
      targetPort: 7422
      name: leafnodes
    - protocol: TCP
      port: 7522
      targetPort: 7522
      name: gateways
cluster:
  enabled: true
  noAdvertise: true
wallyqs commented 4 years ago

Hi @sevaho, disabling Verbose in the client and deploying the Helm chart v0.5.6 should help out with this issue: https://github.com/nats-io/k8s/releases/tag/v0.5.6 cheers :)

sevaho commented 4 years ago

@wallyqs thank you very much, I've upgraded the server and updated the clients. I'll wait until end of week to close this issue if you don't mind just to be sure it works as expected.

greetings

sevaho

wallyqs commented 4 years ago

thanks @sevaho sounds good :)

sevaho commented 4 years ago

Hi @wallyqs I don't have the problem anymore so it looks like it worked :).