Closed boskiv closed 11 months ago
root@nats-03:~# telnet nats-node-01.fra1.prod.do.spreadfighter.cloud 6222
Trying 134.122.59.155...
Connected to nats-node-01.fra1.prod.do.spreadfighter.cloud.
Escape character is '^]'.
INFO {"server_id":"NDZC3WFZLVLLXKPKQ4MTULTJYDA5BH75ZUFBZIUPTB7ZYWJQM2JZVWYW","server_name":"nats-01","version":"2.10.3","proto":2,"go":"go1.21.3","host":"0.0.0.0","port":6222,"headers":true,"max_payload":1048576,"jetstream":true,"nonce":"4En-cNULna74L1Q","cluster":"nats","connect_urls":["134.122.59.155:4222","10.18.0.7:4222","10.110.0.4:4222"],"ws_connect_urls":["134.122.59.155:8080","10.18.0.7:8080","10.110.0.4:8080"],"compression":"accept","lnoc":true,"route_pool_size":3}
^CConnection closed by foreign host.
root@nats-03:~# telnet nats-node-02.fra1.prod.do.spreadfighter.cloud 6222
Trying 164.92.221.230...
Connected to nats-node-02.fra1.prod.do.spreadfighter.cloud.
Escape character is '^]'.
INFO {"server_id":"NDGVNDOKVSXJAUZT54HEUH563DE3E6P33OSWAKAHCVVPYTG67IZ2LDUV","server_name":"nats-02","version":"2.10.3","proto":2,"go":"go1.21.3","host":"0.0.0.0","port":6222,"headers":true,"max_payload":1048576,"jetstream":true,"nonce":"LxKg_3GAkpAjeks","cluster":"nats","connect_urls":["164.92.221.230:4222","10.18.0.6:4222","10.110.0.3:4222"],"ws_connect_urls":["164.92.221.230:8080","10.18.0.6:8080","10.110.0.3:8080"],"compression":"accept","lnoc":true,"route_pool_size":3}
^CConnection closed by foreign host.
root@nats-03:~# telnet nats-node-03.fra1.prod.do.spreadfighter.cloud 6222
Trying 167.99.221.244...
Connected to nats-node-03.fra1.prod.do.spreadfighter.cloud.
Escape character is '^]'.
INFO {"server_id":"NCN7MAT6E2FJ6UIOSSVLCRTFXBALL3SNWX6R65TCQCY7NBLLN7TTAH6X","server_name":"nats-03","version":"2.10.3","proto":2,"go":"go1.21.3","host":"0.0.0.0","port":6222,"headers":true,"max_payload":1048576,"jetstream":true,"nonce":"2K_dzCf2aAFztUw","cluster":"nats","connect_urls":["167.99.221.244:4222","10.18.0.5:4222","10.110.0.2:4222"],"ws_connect_urls":["167.99.221.244:8080","10.18.0.5:8080","10.110.0.2:8080"],"compression":"accept","lnoc":true,"route_pool_size":3}
^CConnection closed by foreign host.
root@nats-03:~#
We see this sometimes when a system either 1) has clients trying to connect to that port 2) a network element, like a load balancer is in between the server (not recommended) and is doing health checks.
Now it's reproduced on Linoe inside K8S. in Linode it's running inside K8S, WS and 4222 port exposed with LoadBalancer to public
nats-2 nats [7] 2023/10/23 05:05:47.705775 [ERR] 10.2.0.6:6222 - rid:607 - Router parser ERROR, state=41, i=29919: proto='"4"...'
nats-2 nats [7] 2023/10/23 05:05:47.705910 [INF] 10.2.0.6:6222 - rid:607 - Router connection closed: Protocol Violation
nats-1 nats [7] 2023/10/23 05:05:47.706677 [ERR] 10.2.2.2:48134 - rid:580 - Route Error 'Unknown Protocol Operation'
nats-1 nats [7] 2023/10/23 05:05:47.706702 [INF] 10.2.2.2:48134 - rid:580 - Router connection closed: Parse Error
nats-2 nats [7] 2023/10/23 05:05:48.759412 [INF] 10.2.0.6:6222 - rid:645 - Route connection created
nats-1 nats [7] 2023/10/23 05:05:48.759696 [INF] 10.2.2.2:47596 - rid:614 - Route connection created
nats-1 nats [7] 2023/10/23 05:05:55.577738 [ERR] 10.2.1.2:58216 - rid:610 - Route Error 'Unknown Protocol Operation'
nats-1 nats [7] 2023/10/23 05:05:55.577760 [INF] 10.2.1.2:58216 - rid:610 - Router connection closed: Parse Error
nats-0 nats [7] 2023/10/23 05:05:55.577722 [ERR] 10.2.0.6:6222 - rid:579 - Router parser ERROR, state=41, i=11372: proto='"5"...'
nats-0 nats [7] 2023/10/23 05:05:55.578100 [INF] 10.2.0.6:6222 - rid:579 - Router connection closed: Protocol Violation
nats-1 nats [7] 2023/10/23 05:05:56.657569 [INF] 10.2.1.2:43192 - rid:615 - Route connection created
nats-0 nats [7] 2023/10/23 05:05:56.657607 [INF] 10.2.0.6:6222 - rid:583 - Route connection created
There is something that is talking to that route that is not another route. Could be a health check, or a network element in between.
No healthchecks on 6222 in that setup
On Mon, Oct 23, 2023 at 18:45 Derek Collison @.***> wrote:
There is something that is talking to that route that is not another route. Could be a health check, or a network element in between.
— Reply to this email directly, view it on GitHub https://github.com/nats-io/nats-server/issues/4688#issuecomment-1775494776, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOZSR5FFCWWKVLK4GUZKGLYA2GK5AVCNFSM6AAAAAA6K24UUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZVGQ4TINZXGY . You are receiving this because you authored the thread.Message ID: @.***>
Suggest you tcpdump
that port to a file and check it when it happens what the content of those first packets look like. Any connection will cause this - for example I see many networks running network scanners, asset discovery tools, or just people doing some app who used to run on this port - anything ilke that will cause this.
I can promise you something besides a router is trying to connect and talk on that port.
I start seen an issues only when clients connected from nats.ws. Before that everything works fine.
My second guess is message size. Can it be an issue when we sent complex json messages around 1mb size to channels?
as you can see core.scripts.screener
is one of our channel name
Oct 24 10:15:16 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:16.618028 [ERR] 167.99.221.244:6222 - rid:34935 - Router parser ERROR, state=41, i=6275: proto='"6.RMSG $G core.scripts.screener."...'
Oct 24 10:15:16 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:16.618597 [INF] 167.99.221.244:6222 - rid:34935 - Router connection closed: Protocol Violation
Oct 24 10:15:17 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:17.719466 [INF] 167.99.221.244:6222 - rid:35072 - Route connection created
Oct 24 10:15:26 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:26.125409 [ERR] 134.122.59.155:6222 - rid:34906 - Route Error 'Unknown Protocol Operation'
Oct 24 10:15:26 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:26.125963 [INF] 134.122.59.155:6222 - rid:34906 - Router connection closed: Parse Error
Oct 24 10:15:27 nats-02 nats-server[10564]: [10564] 2023/10/24 10:15:27.170411 [INF] 134.122.59.155:6222 - rid:35075 - Route connection created
Messages from channels should in protocol on port 6222?
Oct 24 10:01:21 nats-01 nats-server[14612]: [14612] 2023/10/24 10:01:21.447970 [ERR] 164.92.221.230:38528 - rid:10139 - Router parser ERROR, state=41, i=553: proto='"897711,\"week\":-15.740892},{\"symb"...'
Oct 24 10:01:21 nats-01 nats-server[14612]: [14612] 2023/10/24 10:01:21.448519 [INF] 164.92.221.230:38528 - rid:10139 - Router connection closed: Protocol Violation
Oct 24 10:01:22 nats-01 nats-server[14612]: [14612] 2023/10/24 10:01:22.478587 [INF] 164.92.221.230:48906 - rid:35222 - Route connection created
Oct 24 10:02:55 nats-01 nats-server[14612]: [14612] 2023/10/24 10:02:55.883139 [ERR] 164.92.221.230:48906 - rid:35222 - Router parser ERROR, state=41, i=1576: proto='":"...'
Oct 24 10:02:55 nats-01 nats-server[14612]: [14612] 2023/10/24 10:02:55.883164 [INF] 164.92.221.230:48906 - rid:35222 - Router connection closed: Protocol Violation
Oct 24 10:02:56 nats-01 nats-server[14612]: [14612] 2023/10/24 10:02:56.931691 [INF] 164.92.221.230:36278 - rid:35244 - Route connection created
Oct 24 10:07:02 nats-01 nats-server[14612]: [14612] 2023/10/24 10:07:02.730527 [ERR] 216.73.158.102:46448 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:14:19 nats-01 nats-server[14612]: [14612] 2023/10/24 10:14:19.282348 [ERR] 216.146.25.75:59122 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:15:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:15:26.121553 [ERR] 164.92.221.230:36278 - rid:35244 - Router parser ERROR, state=41, i=1575: proto='"ol\":\"STRAXUSDT\",\"day\":-6.810775,"...'
Oct 24 10:15:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:15:26.122156 [INF] 164.92.221.230:36278 - rid:35244 - Router connection closed: Protocol Violation
Oct 24 10:15:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:15:27.166913 [INF] 164.92.221.230:41134 - rid:35411 - Route connection created
Oct 24 10:17:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:17:27.357266 [ERR] 164.92.221.230:41134 - rid:35411 - Router parser ERROR, state=41, i=1575: proto='"M"...'
Oct 24 10:17:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:17:27.358040 [INF] 164.92.221.230:41134 - rid:35411 - Router connection closed: Protocol Violation
Oct 24 10:17:28 nats-01 nats-server[14612]: [14612] 2023/10/24 10:17:28.454609 [INF] 164.92.221.230:43272 - rid:35437 - Route connection created
Oct 24 10:20:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:26.614328 [ERR] 164.92.221.230:43272 - rid:35437 - Router parser ERROR, state=41, i=554: proto='"M"...'
Oct 24 10:20:26 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:26.614895 [INF] 164.92.221.230:43272 - rid:35437 - Router connection closed: Protocol Violation
Oct 24 10:20:27 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:27.623994 [INF] 164.92.221.230:60428 - rid:35474 - Route connection created
Oct 24 10:20:55 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:55.665748 [ERR] 216.146.25.75:45474 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:20:57 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:57.905992 [INF] 178.60.128.5:64078 - rid:35481 - Route connection created
Oct 24 10:20:58 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:58.064578 [ERR] 178.60.128.5:64078 - rid:35481 - Router parser ERROR, state=0, i=0: proto='"GET / HTTP/1.1\r\nHost: 134.122.59"...'
Oct 24 10:20:58 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:58.064644 [INF] 178.60.128.5:64078 - rid:35481 - Router connection closed: Protocol Violation
Oct 24 10:20:59 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:59.772999 [INF] 63.117.14.9:63548 - rid:35482 - Route connection created
Oct 24 10:20:59 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:59.916931 [ERR] 63.117.14.9:63548 - rid:35482 - Router parser ERROR, state=0, i=0: proto='"GET / HTTP/1.1\r\nHost: 134.122.59"...'
Oct 24 10:20:59 nats-01 nats-server[14612]: [14612] 2023/10/24 10:20:59.917231 [INF] 63.117.14.9:63548 - rid:35482 - Router connection closed: Protocol Violation
set max_payload: 8MB does not help
Oct 24 10:30:45 nats-02 nats-server[31668]: [31668] 2023/10/24 10:30:45.923486 [ERR] 216.73.158.102:51044 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:32:32 nats-02 nats-server[31668]: [31668] 2023/10/24 10:32:32.603397 [ERR] 134.122.59.155:6222 - rid:14 - Router parser ERROR, state=41, i=278: proto='"-6.176174,\"week\":-1.351879},{\"sy"...'
Oct 24 10:32:32 nats-02 nats-server[31668]: [31668] 2023/10/24 10:32:32.604000 [INF] 134.122.59.155:6222 - rid:14 - Router connection closed: Protocol Violation
Oct 24 10:32:33 nats-02 nats-server[31668]: [31668] 2023/10/24 10:32:33.713346 [INF] 134.122.59.155:6222 - rid:109 - Route connection created
Here is a tcpdump tcpdump port 6222 -w nats-8mb.dump -s 0
https://backup.nl-ams-1.linodeobjects.com/nats-8mb.dump
And error log in that time frame.
Oct 24 10:36:46 nats-02 nats-server[31668]: [31668] 2023/10/24 10:36:46.001731 [ERR] 216.73.158.102:55254 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 10:37:18 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:18.202695 [ERR] 134.122.59.155:6222 - rid:109 - Router parser ERROR, state=41, i=11958: proto='".fRMSG $G kline.binance.spot.kli"...'
Oct 24 10:37:18 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:18.203317 [INF] 134.122.59.155:6222 - rid:109 - Router connection closed: Protocol Violation
Oct 24 10:37:19 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:19.210469 [INF] 134.122.59.155:6222 - rid:174 - Route connection created
Oct 24 10:37:20 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:20.734921 [ERR] 134.122.59.155:6222 - rid:174 - Router parser ERROR, state=41, i=558: proto='";-y\v\xed\xba\xff\x1c1\fD\xf4k\t\rIN\xc3\xe2\x96\xcaVX\xac{\x18\x88bt\x12\nR"...'
Oct 24 10:37:20 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:20.735562 [INF] 134.122.59.155:6222 - rid:174 - Router connection closed: Protocol Violation
Oct 24 10:37:21 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:21.814065 [INF] 134.122.59.155:6222 - rid:175 - Route connection created
Oct 24 10:37:24 nats-02 nats-server[31668]: [31668] 2023/10/24 10:37:24.991502 [INF] 46.199.68.17/10.110.0.6:64700 - wid:135 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8758 total bytes.
Can someone look to it ?
here is a /varz response
{
"server_id": "NBHPBPTTYFJ5OPBE4AJFOZLL43O6JTJUCLRN2T2ODCC5CW6VN3U455FB",
"server_name": "nats-02",
"version": "2.10.3",
"proto": 1,
"git_commit": "1528434",
"go": "go1.21.3",
"host": "0.0.0.0",
"port": 4222,
"max_connections": 65536,
"ping_interval": 120000000000,
"ping_max": 2,
"http_host": "0.0.0.0",
"http_port": 8222,
"http_base_path": "",
"https_port": 0,
"auth_timeout": 2,
"max_control_line": 4096,
"max_payload": 8388608,
"max_pending": 67108864,
"cluster": {
"name": "nats",
"addr": "0.0.0.0",
"cluster_port": 6222,
"auth_timeout": 2,
"urls": [
"nats-node-01.fra1.prod.do.spreadfighter.cloud:6222",
"nats-node-02.fra1.prod.do.spreadfighter.cloud:6222",
"nats-node-03.fra1.prod.do.spreadfighter.cloud:6222"
],
"tls_timeout": 2,
"pool_size": 3
},
"gateway": {},
"leaf": {},
"mqtt": {},
"websocket": {
"host": "0.0.0.0",
"port": 8080,
"no_tls": true,
"compression": true
},
"jetstream": {
"config": {
"max_memory": 0,
"max_storage": 10737418240,
"store_dir": "/data/jetstream",
"sync_interval": 120000000000
},
"stats": {
"memory": 0,
"storage": 0,
"reserved_memory": 0,
"reserved_storage": 0,
"accounts": 1,
"ha_assets": 1,
"api": {
"total": 0,
"errors": 0
}
},
"meta": {
"name": "nats",
"leader": "nats-03",
"peer": "T2kH0TKP",
"cluster_size": 3
}
},
"tls_timeout": 2,
"write_deadline": 10000000000,
"start": "2023-10-24T10:26:55.279794824Z",
"now": "2023-10-24T10:47:22.679668805Z",
"uptime": "20m27s",
"mem": 45236224,
"cores": 2,
"gomaxprocs": 2,
"cpu": 28,
"connections": 15,
"total_connections": 293,
"routes": 8,
"remotes": 2,
"leafnodes": 0,
"in_msgs": 5778509,
"out_msgs": 7759193,
"in_bytes": 3520022242,
"out_bytes": 5044840707,
"slow_consumers": 2,
"subscriptions": 495,
"http_req_stats": {
"/": 1,
"/varz": 1
},
"config_load_time": "2023-10-24T10:26:55.279794824Z",
"system_account": "$SYS",
"ocsp_peer_cache": {},
"slow_consumer_stats": {
"clients": 2,
"routes": 0,
"gateways": 0,
"leafs": 0
}
}
I've reconfigured nats server to listen on private addresses for cluster router 6222 port. Will update you if it helps.
You might want to consider a commercial engagement with Synadia for support and architecture review etc.
I had a meeting call today with Ina
Same errors in private network
Oct 24 18:01:26 nats-01 nats-server[40264]: [40264] 2023/10/24 18:01:26.150848 [INF] 10.110.0.3:48186 - rid:1689 - Route connection created
Oct 24 18:01:49 nats-01 nats-server[40264]: [40264] 2023/10/24 18:01:49.359591 [INF] 58.186.55.62/10.110.0.6:30122 - wid:1685 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8805 total bytes.
Oct 24 18:02:48 nats-01 nats-server[40264]: [40264] 2023/10/24 18:02:48.767347 [ERR] 10.110.0.3:48186 - rid:1689 - Router parser ERROR, state=41, i=20417: proto='"B"...'
Oct 24 18:02:48 nats-01 nats-server[40264]: [40264] 2023/10/24 18:02:48.767384 [INF] 10.110.0.3:48186 - rid:1689 - Router connection closed: Protocol Violation
Oct 24 18:02:49 nats-01 nats-server[40264]: [40264] 2023/10/24 18:02:49.817611 [INF] 10.110.0.3:44574 - rid:1710 - Route connection created
Oct 24 18:04:12 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:12.374059 [INF] 93.109.110.36/10.110.0.6:59678 - wid:1723 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2 chunks of 1923 total bytes.
Oct 24 18:04:13 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:13.868339 [ERR] 10.110.0.3:44574 - rid:1710 - Router parser ERROR, state=0, i=80: proto='"\"c\":\"0.01994\",\"h\":\"0.01997\",\"l\":"...'
Oct 24 18:04:13 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:13.869046 [INF] 10.110.0.3:44574 - rid:1710 - Router connection closed: Protocol Violation
Oct 24 18:04:14 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:14.872986 [INF] 10.110.0.3:48728 - rid:1732 - Route connection created
Oct 24 18:04:59 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:59.678359 [ERR] 10.110.0.3:48728 - rid:1732 - Router parser ERROR, state=41, i=1572: proto='"WNRMSG $G core.scripts.screener."...'
Oct 24 18:04:59 nats-01 nats-server[40264]: [40264] 2023/10/24 18:04:59.678400 [INF] 10.110.0.3:48728 - rid:1732 - Router connection closed: Protocol Violation
Oct 24 18:05:00 nats-01 nats-server[40264]: [40264] 2023/10/24 18:05:00.683277 [INF] 10.110.0.3:60644 - rid:1743 - Route connection created
Oct 24 18:08:38 nats-01 nats-server[40264]: [40264] 2023/10/24 18:08:38.733692 [INF] 93.109.110.36/10.110.0.6:17558 - wid:1775 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 16 chunks of 31173 total bytes.
Oct 24 18:11:42 nats-01 nats-server[40264]: [40264] 2023/10/24 18:11:42.691693 [INF] 46.199.68.17/10.110.0.5:34560 - wid:1825 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8811 total bytes.
Oct 24 18:11:58 nats-01 nats-server[40264]: [40264] 2023/10/24 18:11:58.954030 [INF] 46.242.10.73/10.110.0.6:63514 - wid:1827 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8811 total bytes.
Oct 24 18:13:00 nats-01 nats-server[40264]: [40264] 2023/10/24 18:13:00.705305 [ERR] 10.110.0.6:46880 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 18:15:19 nats-01 nats-server[40264]: [40264] 2023/10/24 18:15:19.586191 [ERR] 10.110.0.3:60644 - rid:1743 - Router parser ERROR, state=41, i=8461: proto='"3"...'
Oct 24 18:15:19 nats-01 nats-server[40264]: [40264] 2023/10/24 18:15:19.586229 [INF] 10.110.0.3:60644 - rid:1743 - Router connection closed: Protocol Violation
Oct 24 18:15:20 nats-01 nats-server[40264]: [40264] 2023/10/24 18:15:20.660605 [INF] 10.110.0.3:53852 - rid:1881 - Route connection created
Oct 24 18:20:02 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:02.452557 [INF] 58.186.55.62/10.110.0.6:53850 - wid:1914 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8800 total bytes.
Oct 24 18:20:22 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:22.953482 [ERR] 10.110.0.3:53852 - rid:1881 - Router parser ERROR, state=59, i=909: proto='"oteAssetVolume\":\"170225.0790\"}}\r"...'
Oct 24 18:20:22 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:22.954136 [INF] 10.110.0.3:53852 - rid:1881 - Router connection closed: Protocol Violation
Oct 24 18:20:24 nats-01 nats-server[40264]: [40264] 2023/10/24 18:20:24.012338 [INF] 10.110.0.3:57808 - rid:1957 - Route connection created
Oct 24 17:37:17 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:17.078436 [ERR] 10.110.0.5:39520 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:18 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:18.592494 [ERR] 10.110.0.6:64270 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:19 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:19.321894 [ERR] 10.110.0.6:64270 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:19 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:19.471684 [ERR] 10.110.0.5:39520 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:19 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:19.919627 [ERR] 10.110.0.5:39520 - websocket handshake error: invalid value for header 'Upgrade'
Oct 24 17:37:36 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:36.085373 [ERR] 10.110.0.3:6222 - rid:3718 - Router parser ERROR, state=41, i=1368: proto='":1RMSG $G core.scripts.screener."...'
Oct 24 17:37:36 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:36.085409 [INF] 10.110.0.3:6222 - rid:3718 - Router connection closed: Protocol Violation
Oct 24 17:37:37 nats-03 nats-server[34981]: [34981] 2023/10/24 17:37:37.152248 [INF] 10.110.0.3:6222 - rid:3751 - Route connection created
Oct 24 17:40:11 nats-03 nats-server[34981]: [34981] 2023/10/24 17:40:11.490407 [ERR] 10.110.0.3:6222 - rid:3751 - Router parser ERROR, state=41, i=555: proto='"26126,\"week\":-12.483036},{\"symbo"...'
Oct 24 17:40:11 nats-03 nats-server[34981]: [34981] 2023/10/24 17:40:11.491176 [INF] 10.110.0.3:6222 - rid:3751 - Router connection closed: Protocol Violation
Oct 24 17:40:12 nats-03 nats-server[34981]: [34981] 2023/10/24 17:40:12.526038 [INF] 10.110.0.3:6222 - rid:3786 - Route connection created
Oct 24 17:43:52 nats-03 nats-server[34981]: [34981] 2023/10/24 17:43:52.603652 [INF] 93.109.110.36/10.110.0.5:44706 - wid:3804 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8829 total bytes.
Oct 24 17:45:05 nats-03 nats-server[34981]: [34981] 2023/10/24 17:45:05.916791 [INF] 94.110.210.124/10.110.0.6:33266 - wid:3822 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8753 total bytes.
Oct 24 17:56:35 nats-03 nats-server[34981]: [34981] 2023/10/24 17:56:35.961099 [INF] 46.242.10.73/10.110.0.6:26610 - wid:3992 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 8 chunks of 15785 total bytes.
Oct 24 18:05:44 nats-03 nats-server[34981]: [34981] 2023/10/24 18:05:44.993346 [ERR] 10.110.0.3:6222 - rid:3786 - Router parser ERROR, state=41, i=40: proto='"1"...'
Oct 24 18:05:44 nats-03 nats-server[34981]: [34981] 2023/10/24 18:05:44.994069 [INF] 10.110.0.3:6222 - rid:3786 - Router connection closed: Protocol Violation
Oct 24 18:05:46 nats-03 nats-server[34981]: [34981] 2023/10/24 18:05:46.017004 [INF] 10.110.0.3:6222 - rid:4118 - Route connection created
Oct 24 18:06:22 nats-03 nats-server[34981]: [34981] 2023/10/24 18:06:22.421473 [INF] 93.109.110.36/10.110.0.6:17684 - wid:4119 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8838 total bytes.
Oct 24 18:08:48 nats-03 nats-server[34981]: [34981] 2023/10/24 18:08:48.973499 [INF] 92.38.128.133/10.110.0.5:36596 - wid:3959 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2 chunks of 202 total bytes.
Oct 24 18:15:17 nats-03 nats-server[34981]: [34981] 2023/10/24 18:15:17.112835 [ERR] 10.110.0.3:6222 - rid:4118 - Router parser ERROR, state=41, i=8254: proto='"M"...'
Oct 24 18:15:17 nats-03 nats-server[34981]: [34981] 2023/10/24 18:15:17.112864 [INF] 10.110.0.3:6222 - rid:4118 - Router connection closed: Protocol Violation
Oct 24 18:15:18 nats-03 nats-server[34981]: [34981] 2023/10/24 18:15:18.167003 [INF] 10.110.0.3:6222 - rid:4242 - Route connection created
Oct 24 18:17:19 nats-03 nats-server[34981]: [34981] 2023/10/24 18:17:19.847831 [INF] 93.109.110.36/10.110.0.6:62780 - wid:4232 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 10 chunks of 17993 total bytes.
Oct 24 18:04:59 nats-02 nats-server[33202]: [33202] 2023/10/24 18:04:59.675111 [INF] 10.110.0.4:6222 - rid:5529 - Router connection closed: Write Error
Oct 24 18:05:00 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:00.679172 [INF] 10.110.0.4:6222 - rid:5541 - Route connection created
Oct 24 18:05:44 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:44.993040 [ERR] 10.110.0.2:57468 - rid:5200 - Route Error 'Unknown Protocol Operation'
Oct 24 18:05:44 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:44.993063 [INF] 10.110.0.2:57468 - rid:5200 - Router connection closed: Parse Error
Oct 24 18:05:46 nats-02 nats-server[33202]: [33202] 2023/10/24 18:05:46.016709 [INF] 10.110.0.2:39840 - rid:5550 - Route connection created
Oct 24 18:10:42 nats-02 nats-server[33202]: [33202] 2023/10/24 18:10:42.279930 [INF] 95.161.239.245/10.110.0.6:58920 - wid:5603 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 26 chunks of 49791 total bytes.
Oct 24 18:12:30 nats-02 nats-server[33202]: [33202] 2023/10/24 18:12:30.170388 [INF] 93.109.110.36/10.110.0.5:54094 - wid:5631 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 16 chunks of 31219 total bytes.
Oct 24 18:13:03 nats-02 nats-server[33202]: [33202] 2023/10/24 18:13:03.414925 [INF] 85.114.207.18/10.110.0.5:22886 - wid:5643 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 6 chunks of 8867 total bytes.
Oct 24 18:15:17 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:17.113394 [ERR] 10.110.0.2:39840 - rid:5550 - Route Error 'Unknown Protocol Operation'
Oct 24 18:15:17 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:17.113425 [INF] 10.110.0.2:39840 - rid:5550 - Router connection closed: Parse Error
Oct 24 18:15:18 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:18.167164 [INF] 10.110.0.2:48742 - rid:5683 - Route connection created
Oct 24 18:15:19 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:19.583962 [ERR] 10.110.0.4:6222 - rid:5541 - Route Error 'Unknown Protocol Operation'
Oct 24 18:15:19 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:19.584736 [INF] 10.110.0.4:6222 - rid:5541 - Router connection closed: Parse Error
Oct 24 18:15:20 nats-02 nats-server[33202]: [33202] 2023/10/24 18:15:20.657779 [INF] 10.110.0.4:6222 - rid:5688 - Route connection created
Oct 24 18:20:22 nats-02 nats-server[33202]: [33202] 2023/10/24 18:20:22.951852 [ERR] 10.110.0.4:6222 - rid:5688 - Route Error 'Unknown Protocol Operation'
Oct 24 18:20:22 nats-02 nats-server[33202]: [33202] 2023/10/24 18:20:22.951883 [INF] 10.110.0.4:6222 - rid:5688 - Router connection closed: Parse Error
Oct 24 18:20:24 nats-02 nats-server[33202]: [33202] 2023/10/24 18:20:24.009713 [INF] 10.110.0.4:6222 - rid:5759 - Route connection created
Recent config
{
"cluster": {
"name": "nats",
"host": "10.110.0.4",
"port": 6222,
"no_advertise": true,
"routes": [
"nats://10.110.0.2:6222",
"nats://10.110.0.3:6222",
"nats://10.110.0.4:6222"
]
},
"http_port": 8222,
"jetstream": {
"max_file_store": 10Gi,
"max_memory_store": 0,
"store_dir": "/data"
},
"lame_duck_duration": "30s",
"lame_duck_grace_period": "10s",
"pid_file": "/var/run/nats/nats.pid",
"port": 4222,
"max_payload": 8MB,
"server_name": nats-01,
"websocket": {
"compression": true,
"no_tls": true,
"port": 8080
}
}
Could you give us some info on the clients that are on the other end of the websocket?
Also could you share all configs for the servers?
How are you deploying the system?
Client is just Angular application
private createConnection(natsWsUrl: string) {
if (!environment.production) {
console.log('Connecting to nats');
}
return from(
connect({
servers: [natsWsUrl],
user: 'user',
pass: 'password',
})
).pipe(
tap((connection) => (this.natsConnection = connection)),
catchError((error) => {
console.error('Error connecting to nats', error);
return throwError(error);
})
);
}
What else server config you need, I sent all I have in /etc/nats/nats.conf file + /varz I just install deb package and make systemctl file to autoload to ubuntu server.
Is there a network element in between the angular app and the NATS servers?
Could you try disabling compression and see if that changes things?
Yes sure, there is a LoadBalancer between clients and NATS server. There is SSl termination on it also. I will disable compression and report
Turning off compression did not help
Oct 26 07:10:25 nats-01 nats-server[63029]: [63029] 2023/10/26 07:10:25.311165 [INF] 118.71.120.97/10.110.0.5:13108 - wid:40 - Slow Consumer Detected: WriteDeadline of 10s exceeded with 2 chunks of 232 total bytes.
Oct 26 07:13:49 nats-01 nats-server[63029]: [63029] 2023/10/26 07:13:49.485768 [ERR] 10.110.0.2:52332 - rid:51 - Router parser ERROR, state=41, i=5437: proto='"d"...'
Oct 26 07:13:49 nats-01 nats-server[63029]: [63029] 2023/10/26 07:13:49.485793 [INF] 10.110.0.2:52332 - rid:51 - Router connection closed: Protocol Violation
Oct 26 07:13:50 nats-01 nats-server[63029]: [63029] 2023/10/26 07:13:50.517504 [INF] 10.110.0.2:36050 - rid:197 - Route connection created
as you can see nothing comes from external interface now on port 6222
root@nats-01:~# ip a l
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether c6:0b:77:cd:e0:87 brd ff:ff:ff:ff:ff:ff
altname enp0s3
altname ens3
inet 134.122.59.155/20 brd 134.122.63.255 scope global eth0
valid_lft forever preferred_lft forever
inet 10.18.0.7/16 brd 10.18.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::c40b:77ff:fecd:e087/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether e6:21:8e:67:1c:0c brd ff:ff:ff:ff:ff:ff
altname enp0s4
altname ens4
inet 10.110.0.4/20 brd 10.110.15.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::e421:8eff:fe67:1c0c/64 scope link
valid_lft forever preferred_lft forever
root@nats-01:~# tcpdump port 6222
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
root@nats-01:~# tcpdump port 6222
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel
this is current varz
root@nats-01:~# curl localhost:8222/varz
{
"server_id": "NCYKRXMGXVTV3GIJR26EMKCMZTGHLDEFVCVYVDRHLUHSMQFYOHQ6AFIA",
"server_name": "nats-01",
"version": "2.10.3",
"proto": 1,
"git_commit": "1528434",
"go": "go1.21.3",
"host": "0.0.0.0",
"port": 4222,
"max_connections": 65536,
"ping_interval": 120000000000,
"ping_max": 2,
"http_host": "0.0.0.0",
"http_port": 8222,
"http_base_path": "",
"https_port": 0,
"auth_timeout": 2,
"max_control_line": 4096,
"max_payload": 8388608,
"max_pending": 67108864,
"cluster": {
"name": "nats",
"addr": "10.110.0.4",
"cluster_port": 6222,
"auth_timeout": 2,
"urls": [
"10.110.0.2:6222",
"10.110.0.3:6222",
"10.110.0.4:6222"
],
"tls_timeout": 2,
"pool_size": 3
},
"gateway": {},
"leaf": {},
"mqtt": {},
"websocket": {
"host": "0.0.0.0",
"port": 8080,
"no_tls": true
},
"jetstream": {
"config": {
"max_memory": 0,
"max_storage": 10737418240,
"store_dir": "/data/jetstream",
"sync_interval": 120000000000
},
"stats": {
"memory": 0,
"storage": 0,
"reserved_memory": 0,
"reserved_storage": 0,
"accounts": 1,
"ha_assets": 1,
"api": {
"total": 0,
"errors": 0
}
},
"meta": {
"name": "nats",
"leader": "nats-01",
"peer": "pZeTdFV7",
"replicas": [
{
"name": "nats-02",
"current": true,
"active": 384155024,
"peer": "L6QapMtb"
},
{
"name": "nats-03",
"current": true,
"active": 384156372,
"peer": "T2kH0TKP"
}
],
"cluster_size": 3
}
},
"tls_timeout": 2,
"write_deadline": 10000000000,
"start": "2023-10-26T07:05:24.142814455Z",
"now": "2023-10-26T08:38:30.211371351Z",
"uptime": "1h33m6s",
"mem": 73912320,
"cores": 2,
"gomaxprocs": 2,
"cpu": 16,
"connections": 25,
"total_connections": 1523,
"routes": 8,
"remotes": 2,
"leafnodes": 0,
"in_msgs": 31681247,
"out_msgs": 45377401,
"in_bytes": 17348943263,
"out_bytes": 33767517313,
"slow_consumers": 37,
"subscriptions": 662,
"http_req_stats": {
"/varz": 1
},
"config_load_time": "2023-10-26T07:05:24.142814455Z",
"system_account": "$SYS",
"ocsp_peer_cache": {},
"slow_consumer_stats": {
"clients": 37,
"routes": 0,
"gateways": 0,
"leafs": 0
}
}
Would it be possible to have the apps connect directly to the nats servers as an experiment?
I should add an ssl cert to websocket server config. I will try.
On Fri, Oct 27, 2023 at 08:03 Derek Collison @.***> wrote:
Would it be possible to have the apps connect directly to the nats servers as an experiment?
— Reply to this email directly, view it on GitHub https://github.com/nats-io/nats-server/issues/4688#issuecomment-1782298136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOZSRZVBOMB5I5KZG6QKITYBM6DNAVCNFSM6AAAAAA6K24UUOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBSGI4TQMJTGY . You are receiving this because you authored the thread.Message ID: @.***>
Do you have an idea how I can protect nats websocket server from non websocket traffic? I see sometimes it accept scanner's traffic that cause protocol error too.
Does not help. Same errors
Oct 27 07:33:44 nats-01 nats-server[78067]: [78067] 2023/10/27 07:33:44.995445 [ERR] 10.110.0.2:39792 - rid:52 - Router parser ERROR, state=41, i=1282: proto='"t"...'
Oct 27 07:33:44 nats-01 nats-server[78067]: [78067] 2023/10/27 07:33:44.996221 [INF] 10.110.0.2:39792 - rid:52 - Router connection closed: Protocol Violation
Oct 27 07:33:46 nats-01 nats-server[78067]: [78067] 2023/10/27 07:33:46.037019 [INF] 10.110.0.2:33828 - rid:1236 - Route connection created
Oct 27 07:34:58 nats-01 nats-server[78067]: [78067] 2023/10/27 07:34:58.142949 [ERR] 10.110.0.2:33828 - rid:1236 - Router parser ERROR, state=41, i=12809: proto='"ANRMSG $G sf.core.scripts.screen"...'
Oct 27 07:34:58 nats-01 nats-server[78067]: [78067] 2023/10/27 07:34:58.142979 [INF] 10.110.0.2:33828 - rid:1236 - Router connection closed: Protocol Violation
Oct 27 07:34:59 nats-01 nats-server[78067]: [78067] 2023/10/27 07:34:59.160365 [INF] 10.110.0.2:37740 - rid:1254 - Route connection created
Oct 27 07:35:00 nats-01 nats-server[78067]: [78067] 2023/10/27 07:35:00.352229 [ERR] 10.110.0.2:37740 - rid:1254 - Router parser ERROR, state=41, i=62126: proto='"00RMSG $G binance.futures.kline."...'
Oct 27 07:35:00 nats-01 nats-server[78067]: [78067] 2023/10/27 07:35:00.352252 [INF] 10.110.0.2:37740 - rid:1254 - Router connection closed: Protocol Violation
Oct 27 07:35:01 nats-01 nats-server[78067]: [78067] 2023/10/27 07:35:01.373008 [INF] 10.110.0.2:57666 - rid:1255 - Route connection created
websocket: {
tls {
cert_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/fullchain.pem"
key_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/privkey.pem"
}
compression: true,
port: 8443
handshake_timeout: "10s"
authorization {
username: "xxxxxx"
password: "xxxxxxx"
}
}
client config
connect({
servers: [
"nats-node-01.fra1.prod.do.spreadfighter.cloud:8443",
"nats-node-02.fra1.prod.do.spreadfighter.cloud:8443",
"nats-node-03.fra1.prod.do.spreadfighter.cloud:8443"
],
user: 'xxxxx',
pass: 'xxxxxxxx',
})
Thanks. OK, usually when we see this it is almost always something in between the client and the server, or between servers.
Let's confirm the following.
Loopin in @aricart from our side in case there is something on the NATS TS/Js side or the websocket impl.
@boskiv can you verify that the connect URL that you have given in the servers
option to the nats.ws client? On your server I see you have tis turned off, so this should be a ws://somehost:8080
. I read above that you have an LB before it, so it should be the wss://lb:port
Also just a clarification, the nats.ws library is not for node, as their websocket library is non-standard. nats.ws expects a w3c websocket (ie browsers, etc).
Here's a little program you can try:
import {connect} from "https://raw.githubusercontent.com/nats-io/nats.ws/main/src/mod.ts"
const nc = await connect({servers: "wss://demo.nats.io:8443"})
console.log(`connected`)
nc.subscribe("foo.xxx", {callback: (err, msg) => {
console.log(`msg: ${msg.subject}: ${msg.string()}`);
}});
nc.publish("foo.xxx", "hello");
await nc.flush();
To run it, you'll need Deno - https://deno.com, which is a runtime that supports W3S web sockets so it works exactly as a browser would. You can change the connect URL to your server to test that you are reaching the correct one.
One final validation; load-balancing should only be for the websocket port, not the cluster port. Port 6222 as you had it configured should be for use by other nats-servers in a cluster.
Yes guys. I'm using nats.ws in my client angular application. I'm not usig loadbalancer for routing port as you can see above in my recent config. I'm not using any nodejs server/service
my recent config contains TLS also and i removed Load Balancer from connection.
websocket: {
tls {
cert_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/fullchain.pem"
key_file: "/etc/letsencrypt/live/nats-node-01.fra1.prod.do.spreadfighter.cloud/privkey.pem"
}
compression: true,
port: 8443
handshake_timeout: "10s"
authorization {
username: "xxxxxx"
password: "xxxxxxx"
}
}
All that things does not help
What version of Node are you using to run your app?
We are not using nodejs in our app. We are using angular-js as frontend connected directly to nats websocket
Here's a test you can do, when experiencing issues, try to connect to the wss:// url where your nats-sever is supposedly running. - You should see an INFO
protocol be emitted by the server. If you see anything else, you are not connected to a nats-server.
npm install -g wscat
wscat -c wss://connect.ngs.global:443
Connected (press CTRL+C to quit)
< INFO {"server_id":"NCKLE44SQUOTGGOU2ZXJSK5XOBP4PQKA3BYHOH77B5UW2NZ2VKYNVVDU","server_name":"aws-useast2-natscj1-3","version":"2.10.4-RC.4","proto":1,"git_commit":"5d561d2f","go":"go1.21.3","host":"::","port":443,"headers":true,"auth_required":true,"tls_required":true,"tls_available":true,"max_payload":8388608,"jetstream":true,"client_id":314078,"client_ip":"24.7.245.48","nonce":"IDWYqHTtVkDHQv4","cluster":"ngsprod-aws-useast2","domain":"ngs","connect_urls":["3.22.251.18:443","18.116.91.206:443","18.220.225.46:443"],"xkey":"XAITQXEGCQN7JIMXCS5FEQL6RFGVWT5XPWSDQOD53ZRU3OINSHNHLLFJ"}
Also verify all the other connect_urls
sent to the client. It is possible that one of those could be incorrect.
Not sure if this is applicable, but might be worth trying the nightly build in a few hours.
I think that PR #4811 will fix the issue with route protocol error. Closing now, but you can re-open if when running from main
branch (or the nightly), or when v2.10.6 is released, you still experience the issue.
Observed behavior
Expected behavior
No errors
Server and client version
nats-server: v2.10.3 nats.ws: 1.18.0
Host environment
Digital Ocean Droplet
Steps to reproduce
No response