nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.87k stars 1.4k forks source link

Route Error 'Unknown Protocol Operation' #4688

Closed boskiv closed 11 months ago

boskiv commented 1 year ago

Observed behavior

root@nats-02:~# journalctl -u nats -f
Oct 22 11:35:56 nats-02 nats-server[1227]: [1227] 2023/10/22 11:35:56.384916 [INF] 134.122.59.155:34810 - rid:17499 - Route connection created
Oct 22 11:36:12 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:12.624161 [ERR] 167.99.221.244:46548 - rid:17495 - Route Error 'Unknown Protocol Operation'
Oct 22 11:36:12 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:12.624190 [INF] 167.99.221.244:46548 - rid:17495 - Router connection closed: Parse Error
Oct 22 11:36:13 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:13.712144 [INF] 167.99.221.244:45838 - rid:17505 - Route connection created
Oct 22 11:36:14 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:14.909786 [ERR] 134.122.59.155:34810 - rid:17499 - Route Error 'Unknown Protocol Operation'
Oct 22 11:36:14 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:14.909832 [INF] 134.122.59.155:34810 - rid:17499 - Router connection closed: Parse Error
Oct 22 11:36:15 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:15.948951 [INF] 134.122.59.155:54398 - rid:17506 - Route connection created
Oct 22 11:36:28 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:28.716055 [ERR] 134.122.59.155:54398 - rid:17506 - Route Error 'Unknown Protocol Operation'
Oct 22 11:36:28 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:28.716083 [INF] 134.122.59.155:54398 - rid:17506 - Router connection closed: Parse Error
Oct 22 11:36:29 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:29.756519 [INF] 134.122.59.155:52766 - rid:17511 - Route connection created
Oct 22 11:36:42 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:42.494104 [ERR] 134.122.59.155:52766 - rid:17511 - Route Error 'Unknown Protocol Operation'
Oct 22 11:36:42 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:42.494645 [INF] 134.122.59.155:52766 - rid:17511 - Router connection closed: Parse Error
Oct 22 11:36:42 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:42.495021 [ERR] 167.99.221.244:45838 - rid:17505 - Route Error 'Unknown Protocol Operation'
Oct 22 11:36:42 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:42.495145 [INF] 167.99.221.244:45838 - rid:17505 - Router connection closed: Parse Error
Oct 22 11:36:43 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:43.582177 [INF] 167.99.221.244:50994 - rid:17516 - Route connection created
Oct 22 11:36:43 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:43.589974 [INF] 134.122.59.155:46270 - rid:17517 - Route connection created
Oct 22 11:36:47 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:47.103226 [ERR] 134.122.59.155:46270 - rid:17517 - Route Error 'Unknown Protocol Operation'
Oct 22 11:36:47 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:47.103251 [INF] 134.122.59.155:46270 - rid:17517 - Router connection closed: Parse Error
Oct 22 11:36:48 nats-02 nats-server[1227]: [1227] 2023/10/22 11:36:48.153274 [INF] 134.122.59.155:33910 - rid:17520 - Route connection created

Expected behavior

No errors

Server and client version

nats-server: v2.10.3 nats.ws: 1.18.0

Host environment

Digital Ocean Droplet

{
  "cluster": {
    "name": "nats",
    "port": 6222,
    "routes": [
      "nats://nats-node-01.fra1.prod.do.spreadfighter.cloud:6222",
      "nats://nats-node-02.fra1.prod.do.spreadfighter.cloud:6222",
      "nats://nats-node-03.fra1.prod.do.spreadfighter.cloud:6222"
    ]
  },
  "http_port": 8222,
  "jetstream": {
    "max_file_store": 10Gi,
    "max_memory_store": 0,
    "store_dir": "/data"
  },
  "lame_duck_duration": "30s",
  "lame_duck_grace_period": "10s",
  "pid_file": "/var/run/nats/nats.pid",
  "port": 4222,
  "server_name": nats-02,
  "websocket": {
    "compression": true,
    "no_tls": true,
    "port": 8080
  }
}

Steps to reproduce

No response

boskiv commented 1 year ago
root@nats-03:~# telnet nats-node-01.fra1.prod.do.spreadfighter.cloud 6222
Trying 134.122.59.155...
Connected to nats-node-01.fra1.prod.do.spreadfighter.cloud.
Escape character is '^]'.
INFO {"server_id":"NDZC3WFZLVLLXKPKQ4MTULTJYDA5BH75ZUFBZIUPTB7ZYWJQM2JZVWYW","server_name":"nats-01","version":"2.10.3","proto":2,"go":"go1.21.3","host":"0.0.0.0","port":6222,"headers":true,"max_payload":1048576,"jetstream":true,"nonce":"4En-cNULna74L1Q","cluster":"nats","connect_urls":["134.122.59.155:4222","10.18.0.7:4222","10.110.0.4:4222"],"ws_connect_urls":["134.122.59.155:8080","10.18.0.7:8080","10.110.0.4:8080"],"compression":"accept","lnoc":true,"route_pool_size":3}
^CConnection closed by foreign host.
root@nats-03:~# telnet nats-node-02.fra1.prod.do.spreadfighter.cloud 6222
Trying 164.92.221.230...
Connected to nats-node-02.fra1.prod.do.spreadfighter.cloud.
Escape character is '^]'.
INFO {"server_id":"NDGVNDOKVSXJAUZT54HEUH563DE3E6P33OSWAKAHCVVPYTG67IZ2LDUV","server_name":"nats-02","version":"2.10.3","proto":2,"go":"go1.21.3","host":"0.0.0.0","port":6222,"headers":true,"max_payload":1048576,"jetstream":true,"nonce":"LxKg_3GAkpAjeks","cluster":"nats","connect_urls":["164.92.221.230:4222","10.18.0.6:4222","10.110.0.3:4222"],"ws_connect_urls":["164.92.221.230:8080","10.18.0.6:8080","10.110.0.3:8080"],"compression":"accept","lnoc":true,"route_pool_size":3}
^CConnection closed by foreign host.
root@nats-03:~# telnet nats-node-03.fra1.prod.do.spreadfighter.cloud 6222
Trying 167.99.221.244...
Connected to nats-node-03.fra1.prod.do.spreadfighter.cloud.
Escape character is '^]'.
INFO {"server_id":"NCN7MAT6E2FJ6UIOSSVLCRTFXBALL3SNWX6R65TCQCY7NBLLN7TTAH6X","server_name":"nats-03","version":"2.10.3","proto":2,"go":"go1.21.3","host":"0.0.0.0","port":6222,"headers":true,"max_payload":1048576,"jetstream":true,"nonce":"2K_dzCf2aAFztUw","cluster":"nats","connect_urls":["167.99.221.244:4222","10.18.0.5:4222","10.110.0.2:4222"],"ws_connect_urls":["167.99.221.244:8080","10.18.0.5:8080","10.110.0.2:8080"],"compression":"accept","lnoc":true,"route_pool_size":3}
^CConnection closed by foreign host.
root@nats-03:~#
derekcollison commented 1 year ago

We see this sometimes when a system either 1) has clients trying to connect to that port 2) a network element, like a load balancer is in between the server (not recommended) and is doing health checks.

boskiv commented 1 year ago

Now it's reproduced on Linoe inside K8S. in Linode it's running inside K8S, WS and 4222 port exposed with LoadBalancer to public

nats-2 nats [7] 2023/10/23 05:05:47.705775 [ERR] 10.2.0.6:6222 - rid:607 - Router parser ERROR, state=41, i=29919: proto='"4"...'
nats-2 nats [7] 2023/10/23 05:05:47.705910 [INF] 10.2.0.6:6222 - rid:607 - Router connection closed: Protocol Violation
nats-1 nats [7] 2023/10/23 05:05:47.706677 [ERR] 10.2.2.2:48134 - rid:580 - Route Error 'Unknown Protocol Operation'
nats-1 nats [7] 2023/10/23 05:05:47.706702 [INF] 10.2.2.2:48134 - rid:580 - Router connection closed: Parse Error
nats-2 nats [7] 2023/10/23 05:05:48.759412 [INF] 10.2.0.6:6222 - rid:645 - Route connection created
nats-1 nats [7] 2023/10/23 05:05:48.759696 [INF] 10.2.2.2:47596 - rid:614 - Route connection created
nats-1 nats [7] 2023/10/23 05:05:55.577738 [ERR] 10.2.1.2:58216 - rid:610 - Route Error 'Unknown Protocol Operation'
nats-1 nats [7] 2023/10/23 05:05:55.577760 [INF] 10.2.1.2:58216 - rid:610 - Router connection closed: Parse Error
nats-0 nats [7] 2023/10/23 05:05:55.577722 [ERR] 10.2.0.6:6222 - rid:579 - Router parser ERROR, state=41, i=11372: proto='"5"...'
nats-0 nats [7] 2023/10/23 05:05:55.578100 [INF] 10.2.0.6:6222 - rid:579 - Router connection closed: Protocol Violation
nats-1 nats [7] 2023/10/23 05:05:56.657569 [INF] 10.2.1.2:43192 - rid:615 - Route connection created
nats-0 nats [7] 2023/10/23 05:05:56.657607 [INF] 10.2.0.6:6222 - rid:583 - Route connection created
derekcollison commented 1 year ago

There is something that is talking to that route that is not another route. Could be a health check, or a network element in between.

boskiv commented 1 year ago

No healthchecks on 6222 in that setup

On Mon, Oct 23, 2023 at 18:45 Derek Collison @.***> wrote:

There is something that is talking to that route that is not another route. Could be a health check, or a network element in between.

— Reply to this email directly, view it on GitHub https://github.com/nats-io/nats-server/issues/4688#issuecomment-1775494776, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOZSR5FFCWWKVLK4GUZKGLYA2GK5AVCNFSM6AAAAAA6K24UUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVGQ4TINZXGY . You are receiving this because you authored the thread.Message ID: @.***>

ripienaar commented 1 year ago

Suggest you tcpdump that port to a file and check it when it happens what the content of those first packets look like. Any connection will cause this - for example I see many networks running network scanners, asset discovery tools, or just people doing some app who used to run on this port - anything ilke that will cause this.

derekcollison commented 1 year ago

I can promise you something besides a router is trying to connect and talk on that port.

boskiv commented 1 year ago

I start seen an issues only when clients connected from nats.ws. Before that everything works fine.

Screenshot 2023-10-24 at 13 10 49

My second guess is message size. Can it be an issue when we sent complex json messages around 1mb size to channels?

boskiv commented 1 year ago

as you can see core.scripts.screener is one of our channel name

Oct 24 10:15:16 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:16.618028 [ERR] 167.99.221.244:6222 - rid:34935 - Router parser ERROR, state=41, i=6275: proto='"6.RMSG $G core.scripts.screener."...'
Oct 24 10:15:16 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:16.618597 [INF] 167.99.221.244:6222 - rid:34935 - Router connection closed: Protocol Violation
Oct 24 10:15:17 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:17.719466 [INF] 167.99.221.244:6222 - rid:35072 - Route connection created
Oct 24 10:15:26 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:26.125409 [ERR] 134.122.59.155:6222 - rid:34906 - Route Error 'Unknown Protocol Operation'
Oct 24 10:15:26 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:26.125963 [INF] 134.122.59.155:6222 - rid:34906 - Router connection closed: Parse Error
Oct 24 10:15:27 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:27.170411 [INF] 134.122.59.155:6222 - rid:35075 - Route connection created
boskiv commented 1 year ago

Messages from channels should in protocol on port 6222?

Oct 24 10:01:21 nats-01 nats-server[14612]: [14612] 2023/10/24 10:01:21.447970 [ERR] 164.92.221.230:38528 - rid:10139 - Router parser ERROR, state=41, i=553: proto='"897711,\"week\":-15.740892},{\"symb"...'
Oct 24 10:01:21 nats-01 nats-server[14612]: [14612] 2023/10/24 10:01:21.448519 [INF] 164.92.221.230:38528 - rid:10139 - Router connection closed: Protocol Violation
Oct 24 10:01:22 nats-01 nats-server[14612]: [14612] 2023/10/24 10:01:22.478587 [INF] 164.92.221.230:48906 - rid:35222 - Route connection created
Oct 24 10:02:55 nats-01 nats-server[14612]: [14612] 2023/10/24 10:02:55.883139 [ERR] 164.92.221.230:48906 - rid:35222 - Router parser ERROR, state=41, i=1576: proto='":"...'
Oct 24 10:02:55 nats-01 nats-server[14612]: [14612] 2023/10/24 10:02:55.883164 [INF] 164.92.221.230:48906 - rid:35222 - Router connection closed: Protocol Violation
Oct 24 10:02:56 nats-01 nats-server[14612]: [14612] 2023/10/24 10:02:56.931691 [INF] 164.92.221.230:36278 - rid:35244 - Route connection created
Oct 24 10:07:02 nats-01 nats-server[14612]: [14612] 2023/10/24 10:07:02.730527 [ERR] 216.73.158.102:46448 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:14:19 nats-01 nats-server[14612]: [14612] 2023/10/24 10:14:19.282348 [ERR] 216.146.25.75:59122 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:15:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:15:26.121553 [ERR] 164.92.221.230:36278 - rid:35244 - Router parser ERROR, state=41, i=1575: proto='"ol\":\"STRAXUSDT\",\"day\":-6.810775,"...'
Oct 24 10:15:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:15:26.122156 [INF] 164.92.221.230:36278 - rid:35244 - Router connection closed: Protocol Violation
Oct 24 10:15:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:15:27.166913 [INF] 164.92.221.230:41134 - rid:35411 - Route connection created
Oct 24 10:17:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:17:27.357266 [ERR] 164.92.221.230:41134 - rid:35411 - Router parser ERROR, state=41, i=1575: proto='"M"...'
Oct 24 10:17:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:17:27.358040 [INF] 164.92.221.230:41134 - rid:35411 - Router connection closed: Protocol Violation
Oct 24 10:17:28 nats-01 nats-server[14612]: [14612] 2023/10/24 10:17:28.454609 [INF] 164.92.221.230:43272 - rid:35437 - Route connection created
Oct 24 10:20:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:26.614328 [ERR] 164.92.221.230:43272 - rid:35437 - Router parser ERROR, state=41, i=554: proto='"M"...'
Oct 24 10:20:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:26.614895 [INF] 164.92.221.230:43272 - rid:35437 - Router connection closed: Protocol Violation
Oct 24 10:20:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:27.623994 [INF] 164.92.221.230:60428 - rid:35474 - Route connection created
Oct 24 10:20:55 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:55.665748 [ERR] 216.146.25.75:45474 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:20:57 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:57.905992 [INF] 178.60.128.5:64078 - rid:35481 - Route connection created
Oct 24 10:20:58 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:58.064578 [ERR] 178.60.128.5:64078 - rid:35481 - Router parser ERROR, state=0, i=0: proto='"GET / HTTP/1.1\r\nHost: 134.122.59"...'
Oct 24 10:20:58 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:58.064644 [INF] 178.60.128.5:64078 - rid:35481 - Router connection closed: Protocol Violation
Oct 24 10:20:59 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:59.772999 [INF] 63.117.14.9:63548 - rid:35482 - Route connection created
Oct 24 10:20:59 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:59.916931 [ERR] 63.117.14.9:63548 - rid:35482 - Router parser ERROR, state=0, i=0: proto='"GET / HTTP/1.1\r\nHost: 134.122.59"...'
Oct 24 10:20:59 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:59.917231 [INF] 63.117.14.9:63548 - rid:35482 - Router connection closed: Protocol Violation
boskiv commented 1 year ago

set max_payload: 8MB does not help

Oct 24 10:30:45 nats-02 nats-server[31668]: [31668] 2023/10/24 10:30:45.923486 [ERR] 216.73.158.102:51044 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:32:32 nats-02 nats-server[31668]: [31668] 2023/10/24 10:32:32.603397 [ERR] 134.122.59.155:6222 - rid:14 - Router parser ERROR, state=41, i=278: proto='"-6.176174,\"week\":-1.351879},{\"sy"...'
Oct 24 10:32:32 nats-02 nats-server[31668]: [31668] 2023/10/24 10:32:32.604000 [INF] 134.122.59.155:6222 - rid:14 - Router connection closed: Protocol Violation
Oct 24 10:32:33 nats-02 nats-server[31668]: [31668] 2023/10/24 10:32:33.713346 [INF] 134.122.59.155:6222 - rid:109 - Route connection created
boskiv commented 1 year ago

Here is a tcpdump tcpdump port 6222 -w nats-8mb.dump -s 0 https://backup.nl-ams-1.linodeobjects.com/nats-8mb.dump

And error log in that time frame.

Oct 24 10:36:46 nats-02 nats-server[31668]: [31668] 2023/10/24 10:36:46.001731 [ERR] 216.73.158.102:55254 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:37:18 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:18.202695 [ERR] 134.122.59.155:6222 - rid:109 - Router parser ERROR, state=41, i=11958: proto='".fRMSG $G kline.binance.spot.kli"...'
Oct 24 10:37:18 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:18.203317 [INF] 134.122.59.155:6222 - rid:109 - Router connection closed: Protocol Violation
Oct 24 10:37:19 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:19.210469 [INF] 134.122.59.155:6222 - rid:174 - Route connection created
Oct 24 10:37:20 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:20.734921 [ERR] 134.122.59.155:6222 - rid:174 - Router parser ERROR, state=41, i=558: proto='";-y\v\xed\xba\xff\x1c1\fD\xf4k\t\rIN\xc3\xe2\x96\xcaVX\xac{\x18\x88bt\x12\nR"...'
Oct 24 10:37:20 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:20.735562 [INF] 134.122.59.155:6222 - rid:174 - Router connection closed: Protocol Violation
Oct 24 10:37:21 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:21.814065 [INF] 134.122.59.155:6222 - rid:175 - Route connection created
Oct 24 10:37:24 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:24.991502 [INF] 46.199.68.17/10.110.0.6:64700 - wid:135 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8758 total bytes.

Can someone look to it ?

boskiv commented 1 year ago

here is a /varz response

{
  "server_id": "NBHPBPTTYFJ5OPBE4AJFOZLL43O6JTJUCLRN2T2ODCC5CW6VN3U455FB",
  "server_name": "nats-02",
  "version": "2.10.3",
  "proto": 1,
  "git_commit": "1528434",
  "go": "go1.21.3",
  "host": "0.0.0.0",
  "port": 4222,
  "max_connections": 65536,
  "ping_interval": 120000000000,
  "ping_max": 2,
  "http_host": "0.0.0.0",
  "http_port": 8222,
  "http_base_path": "",
  "https_port": 0,
  "auth_timeout": 2,
  "max_control_line": 4096,
  "max_payload": 8388608,
  "max_pending": 67108864,
  "cluster": {
    "name": "nats",
    "addr": "0.0.0.0",
    "cluster_port": 6222,
    "auth_timeout": 2,
    "urls": [
      "nats-node-01.fra1.prod.do.spreadfighter.cloud:6222",
      "nats-node-02.fra1.prod.do.spreadfighter.cloud:6222",
      "nats-node-03.fra1.prod.do.spreadfighter.cloud:6222"
    ],
    "tls_timeout": 2,
    "pool_size": 3
  },
  "gateway": {},
  "leaf": {},
  "mqtt": {},
  "websocket": {
    "host": "0.0.0.0",
    "port": 8080,
    "no_tls": true,
    "compression": true
  },
  "jetstream": {
    "config": {
      "max_memory": 0,
      "max_storage": 10737418240,
      "store_dir": "/data/jetstream",
      "sync_interval": 120000000000
    },
    "stats": {
      "memory": 0,
      "storage": 0,
      "reserved_memory": 0,
      "reserved_storage": 0,
      "accounts": 1,
      "ha_assets": 1,
      "api": {
        "total": 0,
        "errors": 0
      }
    },
    "meta": {
      "name": "nats",
      "leader": "nats-03",
      "peer": "T2kH0TKP",
      "cluster_size": 3
    }
  },
  "tls_timeout": 2,
  "write_deadline": 10000000000,
  "start": "2023-10-24T10:26:55.279794824Z",
  "now": "2023-10-24T10:47:22.679668805Z",
  "uptime": "20m27s",
  "mem": 45236224,
  "cores": 2,
  "gomaxprocs": 2,
  "cpu": 28,
  "connections": 15,
  "total_connections": 293,
  "routes": 8,
  "remotes": 2,
  "leafnodes": 0,
  "in_msgs": 5778509,
  "out_msgs": 7759193,
  "in_bytes": 3520022242,
  "out_bytes": 5044840707,
  "slow_consumers": 2,
  "subscriptions": 495,
  "http_req_stats": {
    "/": 1,
    "/varz": 1
  },
  "config_load_time": "2023-10-24T10:26:55.279794824Z",
  "system_account": "$SYS",
  "ocsp_peer_cache": {},
  "slow_consumer_stats": {
    "clients": 2,
    "routes": 0,
    "gateways": 0,
    "leafs": 0
  }
}
boskiv commented 1 year ago

I've reconfigured nats server to listen on private addresses for cluster router 6222 port. Will update you if it helps.

derekcollison commented 1 year ago

You might want to consider a commercial engagement with Synadia for support and architecture review etc.

boskiv commented 1 year ago

I had a meeting call today with Ina

boskiv commented 1 year ago

Same errors in private network

Oct 24 18:01:26 nats-01 nats-server[40264]: [40264] 2023/10/24 18:01:26.150848 [INF] 10.110.0.3:48186 - rid:1689 - Route connection created
Oct 24 18:01:49 nats-01 nats-server[40264]: [40264] 2023/10/24 18:01:49.359591 [INF] 58.186.55.62/10.110.0.6:30122 - wid:1685 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8805 total bytes.
Oct 24 18:02:48 nats-01 nats-server[40264]: [40264] 2023/10/24 18:02:48.767347 [ERR] 10.110.0.3:48186 - rid:1689 - Router parser ERROR, state=41, i=20417: proto='"B"...'
Oct 24 18:02:48 nats-01 nats-server[40264]: [40264] 2023/10/24 18:02:48.767384 [INF] 10.110.0.3:48186 - rid:1689 - Router connection closed: Protocol Violation
Oct 24 18:02:49 nats-01 nats-server[40264]: [40264] 2023/10/24 18:02:49.817611 [INF] 10.110.0.3:44574 - rid:1710 - Route connection created
Oct 24 18:04:12 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:12.374059 [INF] 93.109.110.36/10.110.0.6:59678 - wid:1723 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2 chunks of 1923 total bytes.
Oct 24 18:04:13 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:13.868339 [ERR] 10.110.0.3:44574 - rid:1710 - Router parser ERROR, state=0, i=80: proto='"\"c\":\"0.01994\",\"h\":\"0.01997\",\"l\":"...'
Oct 24 18:04:13 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:13.869046 [INF] 10.110.0.3:44574 - rid:1710 - Router connection closed: Protocol Violation
Oct 24 18:04:14 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:14.872986 [INF] 10.110.0.3:48728 - rid:1732 - Route connection created
Oct 24 18:04:59 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:59.678359 [ERR] 10.110.0.3:48728 - rid:1732 - Router parser ERROR, state=41, i=1572: proto='"WNRMSG $G core.scripts.screener."...'
Oct 24 18:04:59 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:59.678400 [INF] 10.110.0.3:48728 - rid:1732 - Router connection closed: Protocol Violation
Oct 24 18:05:00 nats-01 nats-server[40264]: [40264] 2023/10/24 18:05:00.683277 [INF] 10.110.0.3:60644 - rid:1743 - Route connection created
Oct 24 18:08:38 nats-01 nats-server[40264]: [40264] 2023/10/24 18:08:38.733692 [INF] 93.109.110.36/10.110.0.6:17558 - wid:1775 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 16 chunks of 31173 total bytes.
Oct 24 18:11:42 nats-01 nats-server[40264]: [40264] 2023/10/24 18:11:42.691693 [INF] 46.199.68.17/10.110.0.5:34560 - wid:1825 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8811 total bytes.
Oct 24 18:11:58 nats-01 nats-server[40264]: [40264] 2023/10/24 18:11:58.954030 [INF] 46.242.10.73/10.110.0.6:63514 - wid:1827 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8811 total bytes.
Oct 24 18:13:00 nats-01 nats-server[40264]: [40264] 2023/10/24 18:13:00.705305 [ERR] 10.110.0.6:46880 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 18:15:19 nats-01 nats-server[40264]: [40264] 2023/10/24 18:15:19.586191 [ERR] 10.110.0.3:60644 - rid:1743 - Router parser ERROR, state=41, i=8461: proto='"3"...'
Oct 24 18:15:19 nats-01 nats-server[40264]: [40264] 2023/10/24 18:15:19.586229 [INF] 10.110.0.3:60644 - rid:1743 - Router connection closed: Protocol Violation
Oct 24 18:15:20 nats-01 nats-server[40264]: [40264] 2023/10/24 18:15:20.660605 [INF] 10.110.0.3:53852 - rid:1881 - Route connection created
Oct 24 18:20:02 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:02.452557 [INF] 58.186.55.62/10.110.0.6:53850 - wid:1914 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8800 total bytes.
Oct 24 18:20:22 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:22.953482 [ERR] 10.110.0.3:53852 - rid:1881 - Router parser ERROR, state=59, i=909: proto='"oteAssetVolume\":\"170225.0790\"}}\r"...'
Oct 24 18:20:22 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:22.954136 [INF] 10.110.0.3:53852 - rid:1881 - Router connection closed: Protocol Violation
Oct 24 18:20:24 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:24.012338 [INF] 10.110.0.3:57808 - rid:1957 - Route connection created
Oct 24 17:37:17 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:17.078436 [ERR] 10.110.0.5:39520 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:18 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:18.592494 [ERR] 10.110.0.6:64270 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:19 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:19.321894 [ERR] 10.110.0.6:64270 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:19 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:19.471684 [ERR] 10.110.0.5:39520 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:19 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:19.919627 [ERR] 10.110.0.5:39520 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:36 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:36.085373 [ERR] 10.110.0.3:6222 - rid:3718 - Router parser ERROR, state=41, i=1368: proto='":1RMSG $G core.scripts.screener."...'
Oct 24 17:37:36 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:36.085409 [INF] 10.110.0.3:6222 - rid:3718 - Router connection closed: Protocol Violation
Oct 24 17:37:37 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:37.152248 [INF] 10.110.0.3:6222 - rid:3751 - Route connection created
Oct 24 17:40:11 nats-03 nats-server[34981]: [34981] 2023/10/24 17:40:11.490407 [ERR] 10.110.0.3:6222 - rid:3751 - Router parser ERROR, state=41, i=555: proto='"26126,\"week\":-12.483036},{\"symbo"...'
Oct 24 17:40:11 nats-03 nats-server[34981]: [34981] 2023/10/24 17:40:11.491176 [INF] 10.110.0.3:6222 - rid:3751 - Router connection closed: Protocol Violation
Oct 24 17:40:12 nats-03 nats-server[34981]: [34981] 2023/10/24 17:40:12.526038 [INF] 10.110.0.3:6222 - rid:3786 - Route connection created
Oct 24 17:43:52 nats-03 nats-server[34981]: [34981] 2023/10/24 17:43:52.603652 [INF] 93.109.110.36/10.110.0.5:44706 - wid:3804 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8829 total bytes.
Oct 24 17:45:05 nats-03 nats-server[34981]: [34981] 2023/10/24 17:45:05.916791 [INF] 94.110.210.124/10.110.0.6:33266 - wid:3822 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8753 total bytes.
Oct 24 17:56:35 nats-03 nats-server[34981]: [34981] 2023/10/24 17:56:35.961099 [INF] 46.242.10.73/10.110.0.6:26610 - wid:3992 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 8 chunks of 15785 total bytes.
Oct 24 18:05:44 nats-03 nats-server[34981]: [34981] 2023/10/24 18:05:44.993346 [ERR] 10.110.0.3:6222 - rid:3786 - Router parser ERROR, state=41, i=40: proto='"1"...'
Oct 24 18:05:44 nats-03 nats-server[34981]: [34981] 2023/10/24 18:05:44.994069 [INF] 10.110.0.3:6222 - rid:3786 - Router connection closed: Protocol Violation
Oct 24 18:05:46 nats-03 nats-server[34981]: [34981] 2023/10/24 18:05:46.017004 [INF] 10.110.0.3:6222 - rid:4118 - Route connection created
Oct 24 18:06:22 nats-03 nats-server[34981]: [34981] 2023/10/24 18:06:22.421473 [INF] 93.109.110.36/10.110.0.6:17684 - wid:4119 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8838 total bytes.
Oct 24 18:08:48 nats-03 nats-server[34981]: [34981] 2023/10/24 18:08:48.973499 [INF] 92.38.128.133/10.110.0.5:36596 - wid:3959 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2 chunks of 202 total bytes.
Oct 24 18:15:17 nats-03 nats-server[34981]: [34981] 2023/10/24 18:15:17.112835 [ERR] 10.110.0.3:6222 - rid:4118 - Router parser ERROR, state=41, i=8254: proto='"M"...'
Oct 24 18:15:17 nats-03 nats-server[34981]: [34981] 2023/10/24 18:15:17.112864 [INF] 10.110.0.3:6222 - rid:4118 - Router connection closed: Protocol Violation
Oct 24 18:15:18 nats-03 nats-server[34981]: [34981] 2023/10/24 18:15:18.167003 [INF] 10.110.0.3:6222 - rid:4242 - Route connection created
Oct 24 18:17:19 nats-03 nats-server[34981]: [34981] 2023/10/24 18:17:19.847831 [INF] 93.109.110.36/10.110.0.6:62780 - wid:4232 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 10 chunks of 17993 total bytes.
Oct 24 18:04:59 nats-02 nats-server[33202]: [33202] 2023/10/24 18:04:59.675111 [INF] 10.110.0.4:6222 - rid:5529 - Router connection closed: Write Error
Oct 24 18:05:00 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:00.679172 [INF] 10.110.0.4:6222 - rid:5541 - Route connection created
Oct 24 18:05:44 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:44.993040 [ERR] 10.110.0.2:57468 - rid:5200 - Route Error 'Unknown Protocol Operation'
Oct 24 18:05:44 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:44.993063 [INF] 10.110.0.2:57468 - rid:5200 - Router connection closed: Parse Error
Oct 24 18:05:46 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:46.016709 [INF] 10.110.0.2:39840 - rid:5550 - Route connection created
Oct 24 18:10:42 nats-02 nats-server[33202]: [33202] 2023/10/24 18:10:42.279930 [INF] 95.161.239.245/10.110.0.6:58920 - wid:5603 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 26 chunks of 49791 total bytes.
Oct 24 18:12:30 nats-02 nats-server[33202]: [33202] 2023/10/24 18:12:30.170388 [INF] 93.109.110.36/10.110.0.5:54094 - wid:5631 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 16 chunks of 31219 total bytes.
Oct 24 18:13:03 nats-02 nats-server[33202]: [33202] 2023/10/24 18:13:03.414925 [INF] 85.114.207.18/10.110.0.5:22886 - wid:5643 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8867 total bytes.
Oct 24 18:15:17 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:17.113394 [ERR] 10.110.0.2:39840 - rid:5550 - Route Error 'Unknown Protocol Operation'
Oct 24 18:15:17 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:17.113425 [INF] 10.110.0.2:39840 - rid:5550 - Router connection closed: Parse Error
Oct 24 18:15:18 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:18.167164 [INF] 10.110.0.2:48742 - rid:5683 - Route connection created
Oct 24 18:15:19 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:19.583962 [ERR] 10.110.0.4:6222 - rid:5541 - Route Error 'Unknown Protocol Operation'
Oct 24 18:15:19 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:19.584736 [INF] 10.110.0.4:6222 - rid:5541 - Router connection closed: Parse Error
Oct 24 18:15:20 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:20.657779 [INF] 10.110.0.4:6222 - rid:5688 - Route connection created
Oct 24 18:20:22 nats-02 nats-server[33202]: [33202] 2023/10/24 18:20:22.951852 [ERR] 10.110.0.4:6222 - rid:5688 - Route Error 'Unknown Protocol Operation'
Oct 24 18:20:22 nats-02 nats-server[33202]: [33202] 2023/10/24 18:20:22.951883 [INF] 10.110.0.4:6222 - rid:5688 - Router connection closed: Parse Error
Oct 24 18:20:24 nats-02 nats-server[33202]: [33202] 2023/10/24 18:20:24.009713 [INF] 10.110.0.4:6222 - rid:5759 - Route connection created
boskiv commented 1 year ago

Recent config

{
  "cluster": {
    "name": "nats",
    "host": "10.110.0.4",
    "port": 6222,
    "no_advertise": true,
    "routes": [
      "nats://10.110.0.2:6222",
      "nats://10.110.0.3:6222",
      "nats://10.110.0.4:6222"
    ]
  },
  "http_port": 8222,
  "jetstream": {
    "max_file_store": 10Gi,
    "max_memory_store": 0,
    "store_dir": "/data"
  },
  "lame_duck_duration": "30s",
  "lame_duck_grace_period": "10s",
  "pid_file": "/var/run/nats/nats.pid",
  "port": 4222,
  "max_payload": 8MB,
  "server_name": nats-01,
  "websocket": {
    "compression": true,
    "no_tls": true,
    "port": 8080
  }
}
derekcollison commented 1 year ago

Could you give us some info on the clients that are on the other end of the websocket?

Also could you share all configs for the servers?

How are you deploying the system?

boskiv commented 1 year ago

Client is just Angular application

private createConnection(natsWsUrl: string) {
    if (!environment.production) {
      console.log('Connecting to nats');
    }
    return from(
      connect({
        servers: [natsWsUrl],
        user: 'user',
        pass: 'password',
      })
    ).pipe(
      tap((connection) => (this.natsConnection = connection)),
      catchError((error) => {
        console.error('Error connecting to nats', error);
        return throwError(error);
      })
    );
  }

What else server config you need, I sent all I have in /etc/nats/nats.conf file + /varz I just install deb package and make systemctl file to autoload to ubuntu server.

derekcollison commented 1 year ago

Is there a network element in between the angular app and the NATS servers?

Could you try disabling compression and see if that changes things?

boskiv commented 1 year ago

Yes sure, there is a LoadBalancer between clients and NATS server. There is SSl termination on it also. I will disable compression and report

boskiv commented 1 year ago

Turning off compression did not help

Oct 26 07:10:25 nats-01 nats-server[63029]: [63029] 2023/10/26 07:10:25.311165 [INF] 118.71.120.97/10.110.0.5:13108 - wid:40 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2 chunks of 232 total bytes.
Oct 26 07:13:49 nats-01 nats-server[63029]: [63029] 2023/10/26 07:13:49.485768 [ERR] 10.110.0.2:52332 - rid:51 - Router parser ERROR, state=41, i=5437: proto='"d"...'
Oct 26 07:13:49 nats-01 nats-server[63029]: [63029] 2023/10/26 07:13:49.485793 [INF] 10.110.0.2:52332 - rid:51 - Router connection closed: Protocol Violation
Oct 26 07:13:50 nats-01 nats-server[63029]: [63029] 2023/10/26 07:13:50.517504 [INF] 10.110.0.2:36050 - rid:197 - Route connection created

Screenshot 2023-10-26 at 10 14 42

boskiv commented 1 year ago

as you can see nothing comes from external interface now on port 6222

root@nats-01:~# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether c6:0b:77:cd:e0:87 brd ff:ff:ff:ff:ff:ff
    altname enp0s3
    altname ens3
    inet 134.122.59.155/20 brd 134.122.63.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.18.0.7/16 brd 10.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c40b:77ff:fecd:e087/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether e6:21:8e:67:1c:0c brd ff:ff:ff:ff:ff:ff
    altname enp0s4
    altname ens4
    inet 10.110.0.4/20 brd 10.110.15.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::e421:8eff:fe67:1c0c/64 scope link
       valid_lft forever preferred_lft forever
root@nats-01:~# tcpdump port 6222
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
root@nats-01:~# tcpdump port 6222
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
boskiv commented 1 year ago

this is current varz

root@nats-01:~# curl localhost:8222/varz
{
  "server_id": "NCYKRXMGXVTV3GIJR26EMKCMZTGHLDEFVCVYVDRHLUHSMQFYOHQ6AFIA",
  "server_name": "nats-01",
  "version": "2.10.3",
  "proto": 1,
  "git_commit": "1528434",
  "go": "go1.21.3",
  "host": "0.0.0.0",
  "port": 4222,
  "max_connections": 65536,
  "ping_interval": 120000000000,
  "ping_max": 2,
  "http_host": "0.0.0.0",
  "http_port": 8222,
  "http_base_path": "",
  "https_port": 0,
  "auth_timeout": 2,
  "max_control_line": 4096,
  "max_payload": 8388608,
  "max_pending": 67108864,
  "cluster": {
    "name": "nats",
    "addr": "10.110.0.4",
    "cluster_port": 6222,
    "auth_timeout": 2,
    "urls": [
      "10.110.0.2:6222",
      "10.110.0.3:6222",
      "10.110.0.4:6222"
    ],
    "tls_timeout": 2,
    "pool_size": 3
  },
  "gateway": {},
  "leaf": {},
  "mqtt": {},
  "websocket": {
    "host": "0.0.0.0",
    "port": 8080,
    "no_tls": true
  },
  "jetstream": {
    "config": {
      "max_memory": 0,
      "max_storage": 10737418240,
      "store_dir": "/data/jetstream",
      "sync_interval": 120000000000
    },
    "stats": {
      "memory": 0,
      "storage": 0,
      "reserved_memory": 0,
      "reserved_storage": 0,
      "accounts": 1,
      "ha_assets": 1,
      "api": {
        "total": 0,
        "errors": 0
      }
    },
    "meta": {
      "name": "nats",
      "leader": "nats-01",
      "peer": "pZeTdFV7",
      "replicas": [
        {
          "name": "nats-02",
          "current": true,
          "active": 384155024,
          "peer": "L6QapMtb"
        },
        {
          "name": "nats-03",
          "current": true,
          "active": 384156372,
          "peer": "T2kH0TKP"
        }
      ],
      "cluster_size": 3
    }
  },
  "tls_timeout": 2,
  "write_deadline": 10000000000,
  "start": "2023-10-26T07:05:24.142814455Z",
  "now": "2023-10-26T08:38:30.211371351Z",
  "uptime": "1h33m6s",
  "mem": 73912320,
  "cores": 2,
  "gomaxprocs": 2,
  "cpu": 16,
  "connections": 25,
  "total_connections": 1523,
  "routes": 8,
  "remotes": 2,
  "leafnodes": 0,
  "in_msgs": 31681247,
  "out_msgs": 45377401,
  "in_bytes": 17348943263,
  "out_bytes": 33767517313,
  "slow_consumers": 37,
  "subscriptions": 662,
  "http_req_stats": {
    "/varz": 1
  },
  "config_load_time": "2023-10-26T07:05:24.142814455Z",
  "system_account": "$SYS",
  "ocsp_peer_cache": {},
  "slow_consumer_stats": {
    "clients": 37,
    "routes": 0,
    "gateways": 0,
    "leafs": 0
  }
}
derekcollison commented 1 year ago

Would it be possible to have the apps connect directly to the nats servers as an experiment?

boskiv commented 1 year ago

I should add an ssl cert to websocket server config. I will try.

On Fri, Oct 27, 2023 at 08:03 Derek Collison @.***> wrote:

Would it be possible to have the apps connect directly to the nats servers as an experiment?

— Reply to this email directly, view it on GitHub https://github.com/nats-io/nats-server/issues/4688#issuecomment-1782298136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOZSRZVBOMB5I5KZG6QKITYBM6DNAVCNFSM6AAAAAA6K24UUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSGI4TQMJTGY . You are receiving this because you authored the thread.Message ID: @.***>

boskiv commented 1 year ago

Do you have an idea how I can protect nats websocket server from non websocket traffic? I see sometimes it accept scanner's traffic that cause protocol error too.

boskiv commented 1 year ago

Does not help. Same errors

Oct 27 07:33:44 nats-01 nats-server[78067]: [78067] 2023/10/27 07:33:44.995445 [ERR] 10.110.0.2:39792 - rid:52 - Router parser ERROR, state=41, i=1282: proto='"t"...'
Oct 27 07:33:44 nats-01 nats-server[78067]: [78067] 2023/10/27 07:33:44.996221 [INF] 10.110.0.2:39792 - rid:52 - Router connection closed: Protocol Violation
Oct 27 07:33:46 nats-01 nats-server[78067]: [78067] 2023/10/27 07:33:46.037019 [INF] 10.110.0.2:33828 - rid:1236 - Route connection created
Oct 27 07:34:58 nats-01 nats-server[78067]: [78067] 2023/10/27 07:34:58.142949 [ERR] 10.110.0.2:33828 - rid:1236 - Router parser ERROR, state=41, i=12809: proto='"ANRMSG $G sf.core.scripts.screen"...'
Oct 27 07:34:58 nats-01 nats-server[78067]: [78067] 2023/10/27 07:34:58.142979 [INF] 10.110.0.2:33828 - rid:1236 - Router connection closed: Protocol Violation
Oct 27 07:34:59 nats-01 nats-server[78067]: [78067] 2023/10/27 07:34:59.160365 [INF] 10.110.0.2:37740 - rid:1254 - Route connection created
Oct 27 07:35:00 nats-01 nats-server[78067]: [78067] 2023/10/27 07:35:00.352229 [ERR] 10.110.0.2:37740 - rid:1254 - Router parser ERROR, state=41, i=62126: proto='"00RMSG $G binance.futures.kline."...'
Oct 27 07:35:00 nats-01 nats-server[78067]: [78067] 2023/10/27 07:35:00.352252 [INF] 10.110.0.2:37740 - rid:1254 - Router connection closed: Protocol Violation
Oct 27 07:35:01 nats-01 nats-server[78067]: [78067] 2023/10/27 07:35:01.373008 [INF] 10.110.0.2:57666 - rid:1255 - Route connection created
boskiv commented 1 year ago
websocket: {
    tls {
      cert_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/fullchain.pem"
      key_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/privkey.pem"
    }
    compression: true,
    port: 8443
    handshake_timeout: "10s"
    authorization {
      username: "xxxxxx"
      password: "xxxxxxx"
    }
  }

client config

connect({
        servers: [
          "nats-node-01.fra1.prod.do.spreadfighter.cloud:8443",
          "nats-node-02.fra1.prod.do.spreadfighter.cloud:8443",
          "nats-node-03.fra1.prod.do.spreadfighter.cloud:8443"
        ],
        user: 'xxxxx',
        pass: 'xxxxxxxx',
      })
derekcollison commented 1 year ago

Thanks. OK, usually when we see this it is almost always something in between the client and the server, or between servers.

Let's confirm the following.

  1. Server version
  2. Client version for the Angular app (JS/TS NATS client version)
  3. Node.js version being used.

Loopin in @aricart from our side in case there is something on the NATS TS/Js side or the websocket impl.

aricart commented 1 year ago

@boskiv can you verify that the connect URL that you have given in the servers option to the nats.ws client? On your server I see you have tis turned off, so this should be a ws://somehost:8080. I read above that you have an LB before it, so it should be the wss://lb:port

aricart commented 1 year ago

Also just a clarification, the nats.ws library is not for node, as their websocket library is non-standard. nats.ws expects a w3c websocket (ie browsers, etc).

aricart commented 1 year ago

Here's a little program you can try:

import {connect} from "https://raw.githubusercontent.com/nats-io/nats.ws/main/src/mod.ts"

const nc = await connect({servers: "wss://demo.nats.io:8443"})
console.log(`connected`)

nc.subscribe("foo.xxx", {callback: (err, msg) => {
    console.log(`msg: ${msg.subject}: ${msg.string()}`);
}});

nc.publish("foo.xxx", "hello");

await nc.flush();

To run it, you'll need Deno - https://deno.com, which is a runtime that supports W3S web sockets so it works exactly as a browser would. You can change the connect URL to your server to test that you are reaching the correct one.

davedotdev commented 1 year ago

One final validation; load-balancing should only be for the websocket port, not the cluster port. Port 6222 as you had it configured should be for use by other nats-servers in a cluster.

boskiv commented 1 year ago

Yes guys. I'm using nats.ws in my client angular application. I'm not usig loadbalancer for routing port as you can see above in my recent config. I'm not using any nodejs server/service

boskiv commented 1 year ago

my recent config contains TLS also and i removed Load Balancer from connection.

websocket: {
    tls {
      cert_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/fullchain.pem"
      key_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/privkey.pem"
    }
    compression: true,
    port: 8443
    handshake_timeout: "10s"
    authorization {
      username: "xxxxxx"
      password: "xxxxxxx"
    }
  }

All that things does not help

derekcollison commented 1 year ago

What version of Node are you using to run your app?

boskiv commented 1 year ago

We are not using nodejs in our app. We are using angular-js as frontend connected directly to nats websocket

aricart commented 1 year ago

Here's a test you can do, when experiencing issues, try to connect to the wss:// url where your nats-sever is supposedly running. - You should see an INFO protocol be emitted by the server. If you see anything else, you are not connected to a nats-server.

npm install -g wscat

wscat -c wss://connect.ngs.global:443
Connected (press CTRL+C to quit)
< INFO {"server_id":"NCKLE44SQUOTGGOU2ZXJSK5XOBP4PQKA3BYHOH77B5UW2NZ2VKYNVVDU","server_name":"aws-useast2-natscj1-3","version":"2.10.4-RC.4","proto":1,"git_commit":"5d561d2f","go":"go1.21.3","host":"::","port":443,"headers":true,"auth_required":true,"tls_required":true,"tls_available":true,"max_payload":8388608,"jetstream":true,"client_id":314078,"client_ip":"24.7.245.48","nonce":"IDWYqHTtVkDHQv4","cluster":"ngsprod-aws-useast2","domain":"ngs","connect_urls":["3.22.251.18:443","18.116.91.206:443","18.220.225.46:443"],"xkey":"XAITQXEGCQN7JIMXCS5FEQL6RFGVWT5XPWSDQOD53ZRU3OINSHNHLLFJ"} 
aricart commented 1 year ago

Also verify all the other connect_urls sent to the client. It is possible that one of those could be incorrect.

derekcollison commented 12 months ago

Not sure if this is applicable, but might be worth trying the nightly build in a few hours.

https://github.com/nats-io/nats-server/pull/4755

kozlovic commented 11 months ago

I think that PR #4811 will fix the issue with route protocol error. Closing now, but you can re-open if when running from main branch (or the nightly), or when v2.10.6 is released, you still experience the issue.