nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.11k stars 1.36k forks source link

Incoming messages to the host without connections #5539

Open nenych opened 2 weeks ago

nenych commented 2 weeks ago

Observed behavior

We have a NATS cluster with 3 nodes in different AZ and connecting clients also to the node in the same AZ. We have 1 node without connections, and once we got the situation when this node has no connections but a lot of incoming "orphan" messages from the other nodes:

CleanShot 2024-06-14 at 17 13 40@2x

We tried to change "pool_size = -1" - cluster updated the configuration but it did not help to down-crease incoming traffic.

Routes before the pool_size parameter change ```js { "server": { "name": "nats-cluster-v4-0", "host": "0.0.0.0", "id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "cluster": "nats-cluster-v4", "ver": "2.10.14", "jetstream": false, "flags": 0, "seq": 27816, "time": "2024-06-14T09:00:15.936315993Z" }, "data": { "server_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "server_name": "nats-cluster-v4-0", "now": "2024-06-14T09:00:15.936260353Z", "num_routes": 8, "routes": [ { "rid": 6, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 6222, "start": "2024-06-04T17:45:20.530656125Z", "last_activity": "2024-06-04T17:45:20.531406265Z", "rtt": "218µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 9, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 6222, "start": "2024-06-04T17:45:20.578446619Z", "last_activity": "2024-06-04T17:45:20.578905629Z", "rtt": "207µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 12, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 6222, "start": "2024-06-04T17:45:20.660375Z", "last_activity": "2024-06-14T09:00:15.936291023Z", "rtt": "219µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 1045046185, "out_msgs": 65460384, "in_bytes": 1868687327845, "out_bytes": 51359890769, "subscriptions": 6252, "compression": "off" }, { "rid": 8, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:20.530737765Z", "last_activity": "2024-06-04T17:45:20.531492214Z", "rtt": "292µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 10, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:20.604039586Z", "last_activity": "2024-06-04T17:45:20.604548357Z", "rtt": "308µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 11, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:20.633688383Z", "last_activity": "2024-06-14T09:00:15.931065823Z", "rtt": "243µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 33948303, "out_msgs": 14153420, "in_bytes": 11515003051, "out_bytes": 10301579698, "subscriptions": 13, "compression": "off" }, { "rid": 5, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 6222, "start": "2024-06-04T17:45:20.530645194Z", "last_activity": "2024-06-14T09:00:15.936240633Z", "rtt": "248µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 27798, "out_msgs": 27786, "in_bytes": 42873121, "out_bytes": 42789916, "subscriptions": 48, "account": "$SYS", "compression": "off" }, { "rid": 7, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:20.530697954Z", "last_activity": "2024-06-14T09:00:15.936240633Z", "rtt": "311µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 27765, "out_msgs": 27775, "in_bytes": 42166135, "out_bytes": 42464040, "subscriptions": 48, "account": "$SYS", "compression": "off" } ] } } { "server": { "name": "nats-cluster-v4-1", "host": "0.0.0.0", "id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "cluster": "nats-cluster-v4", "ver": "2.10.14", "jetstream": false, "flags": 0, "seq": 27874, "time": "2024-06-14T09:00:15.936405351Z" }, "data": { "server_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "server_name": "nats-cluster-v4-1", "now": "2024-06-14T09:00:15.936370711Z", "num_routes": 8, "routes": [ { "rid": 6, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:08.358281524Z", "last_activity": "2024-06-04T17:45:08.358675434Z", "rtt": "380µs", "uptime": "9d15h15m7s", "idle": "9d15h15m7s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 11, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:08.373206593Z", "last_activity": "2024-06-04T17:45:08.373595833Z", "rtt": "295µs", "uptime": "9d15h15m7s", "idle": "9d15h15m7s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 13, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:08.401488062Z", "last_activity": "2024-06-14T09:00:15.936183591Z", "rtt": "234µs", "uptime": "9d15h15m7s", "idle": "0s", "pending_size": 0, "in_msgs": 79617062, "out_msgs": 1027217544, "in_bytes": 68925666034, "out_bytes": 1863523035076, "subscriptions": 13, "compression": "off" }, { "rid": 22, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 52636, "start": "2024-06-04T17:45:20.530854683Z", "last_activity": "2024-06-04T17:45:20.531027283Z", "rtt": "223µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 23, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 52648, "start": "2024-06-04T17:45:20.578585482Z", "last_activity": "2024-06-04T17:45:20.578741222Z", "rtt": "270µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 24, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 52652, "start": "2024-06-04T17:45:20.660517999Z", "last_activity": "2024-06-14T09:00:15.936183591Z", "rtt": "219µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 65460384, "out_msgs": 1045046186, "in_bytes": 51359890769, "out_bytes": 1868687328185, "subscriptions": 1286, "compression": "off" }, { "rid": 5, "remote_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.33", "port": 6222, "start": "2024-06-04T17:45:08.356582374Z", "last_activity": "2024-06-14T09:00:12.199988588Z", "rtt": "246µs", "uptime": "9d15h15m7s", "idle": "3s", "pending_size": 0, "in_msgs": 27777, "out_msgs": 27804, "in_bytes": 42177782, "out_bytes": 42809896, "subscriptions": 48, "account": "$SYS", "compression": "off" }, { "rid": 21, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 52630, "start": "2024-06-04T17:45:20.530829183Z", "last_activity": "2024-06-14T09:00:15.936356571Z", "rtt": "337µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 27786, "out_msgs": 27798, "in_bytes": 42789916, "out_bytes": 42873121, "subscriptions": 49, "account": "$SYS", "compression": "off" } ] } } { "server": { "name": "nats-cluster-v4-2", "host": "0.0.0.0", "id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "cluster": "nats-cluster-v4", "ver": "2.10.14", "jetstream": false, "flags": 0, "seq": 27816, "time": "2024-06-14T09:00:15.936489307Z" }, "data": { "server_id": "NC34AP6LAJSBGRNQOZHE5OYXNEAEDITZXBGYQP4USZ3GHMBJP2DGCKO2", "server_name": "nats-cluster-v4-2", "now": "2024-06-14T09:00:15.936449387Z", "num_routes": 8, "routes": [ { "rid": 26, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 59576, "start": "2024-06-04T17:45:08.358430289Z", "last_activity": "2024-06-04T17:45:08.358559849Z", "rtt": "255µs", "uptime": "9d15h15m7s", "idle": "9d15h15m7s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 27, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 59578, "start": "2024-06-04T17:45:08.373324948Z", "last_activity": "2024-06-04T17:45:08.373466167Z", "rtt": "249µs", "uptime": "9d15h15m7s", "idle": "9d15h15m7s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 28, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 59588, "start": "2024-06-04T17:45:08.401617335Z", "last_activity": "2024-06-14T09:00:15.936316987Z", "rtt": "290µs", "uptime": "9d15h15m7s", "idle": "0s", "pending_size": 0, "in_msgs": 1027217544, "out_msgs": 79617062, "in_bytes": 1863523035076, "out_bytes": 68925666034, "subscriptions": 6252, "compression": "off" }, { "rid": 29, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 44962, "start": "2024-06-04T17:45:20.530866763Z", "last_activity": "2024-06-04T17:45:20.531103313Z", "rtt": "215µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 31, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 44970, "start": "2024-06-04T17:45:20.604194468Z", "last_activity": "2024-06-04T17:45:20.604339128Z", "rtt": "277µs", "uptime": "9d15h14m55s", "idle": "9d15h14m55s", "pending_size": 0, "in_msgs": 0, "out_msgs": 0, "in_bytes": 0, "out_bytes": 0, "subscriptions": 0, "compression": "off" }, { "rid": 32, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 44984, "start": "2024-06-04T17:45:20.633770436Z", "last_activity": "2024-06-14T09:00:15.930918998Z", "rtt": "238µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 14153420, "out_msgs": 33948303, "in_bytes": 10301579698, "out_bytes": 11515003051, "subscriptions": 1286, "compression": "off" }, { "rid": 25, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 59560, "start": "2024-06-04T17:45:08.356758569Z", "last_activity": "2024-06-14T09:00:12.199711305Z", "rtt": "323µs", "uptime": "9d15h15m7s", "idle": "3s", "pending_size": 0, "in_msgs": 27804, "out_msgs": 27777, "in_bytes": 42809896, "out_bytes": 42177782, "subscriptions": 48, "account": "$SYS", "compression": "off" }, { "rid": 30, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 44952, "start": "2024-06-04T17:45:20.530884863Z", "last_activity": "2024-06-14T09:00:15.936429307Z", "rtt": "341µs", "uptime": "9d15h14m55s", "idle": "0s", "pending_size": 0, "in_msgs": 27775, "out_msgs": 27765, "in_bytes": 42464040, "out_bytes": 42166135, "subscriptions": 49, "account": "$SYS", "compression": "off" } ] } } ```
Routes after the change ```js { "server": { "name": "nats-cluster-v4-1", "host": "0.0.0.0", "id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "cluster": "nats-cluster-v4", "ver": "2.10.14", "jetstream": false, "flags": 0, "seq": 30506, "time": "2024-06-14T15:48:01.505977662Z" }, "data": { "server_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "server_name": "nats-cluster-v4-1", "now": "2024-06-14T15:48:01.505934692Z", "num_routes": 2, "routes": [ { "rid": 6264, "remote_id": "NBRW5CG3HCOJSGMMXDGDGHPHTFUDLEYFUP3PXFWLNVFZ4PXCVLCQXNCI", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.149", "port": 52110, "start": "2024-06-14T12:46:40.646080193Z", "last_activity": "2024-06-14T15:48:01.505911722Z", "rtt": "332µs", "uptime": "3h1m20s", "idle": "0s", "pending_size": 0, "in_msgs": 53161, "out_msgs": 882, "in_bytes": 75416839, "out_bytes": 335060, "subscriptions": 56, "compression": "off" }, { "rid": 6122, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 52472, "start": "2024-06-14T12:34:42.848411331Z", "last_activity": "2024-06-14T15:48:01.505957832Z", "rtt": "291µs", "uptime": "3h13m18s", "idle": "0s", "pending_size": 0, "in_msgs": 682786, "out_msgs": 87690088, "in_bytes": 690431012, "out_bytes": 180559446164, "subscriptions": 1502, "compression": "off" } ] } } { "server": { "name": "nats-cluster-v4-0", "host": "0.0.0.0", "id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "cluster": "nats-cluster-v4", "ver": "2.10.14", "jetstream": false, "flags": 0, "seq": 28927, "time": "2024-06-14T15:48:01.506207613Z" }, "data": { "server_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "server_name": "nats-cluster-v4-0", "now": "2024-06-14T15:48:01.506158393Z", "num_routes": 2, "routes": [ { "rid": 6056, "remote_id": "NBRW5CG3HCOJSGMMXDGDGHPHTFUDLEYFUP3PXFWLNVFZ4PXCVLCQXNCI", "remote_name": "nats-cluster-v4-2", "did_solicit": true, "is_configured": true, "ip": "10.202.172.149", "port": 58960, "start": "2024-06-14T12:46:40.646873006Z", "last_activity": "2024-06-14T15:48:01.485820224Z", "rtt": "402µs", "uptime": "3h1m20s", "idle": "0s", "pending_size": 0, "in_msgs": 582011, "out_msgs": 1127420, "in_bytes": 246896380, "out_bytes": 3392144265, "subscriptions": 56, "compression": "off" }, { "rid": 6038, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 6222, "start": "2024-06-14T12:34:42.848315979Z", "last_activity": "2024-06-14T15:48:01.506126813Z", "rtt": "266µs", "uptime": "3h13m18s", "idle": "0s", "pending_size": 0, "in_msgs": 87690088, "out_msgs": 682786, "in_bytes": 180559446164, "out_bytes": 690431012, "subscriptions": 6132, "compression": "off" } ] } } { "server": { "name": "nats-cluster-v4-2", "host": "0.0.0.0", "id": "NBRW5CG3HCOJSGMMXDGDGHPHTFUDLEYFUP3PXFWLNVFZ4PXCVLCQXNCI", "cluster": "nats-cluster-v4", "ver": "2.10.14", "jetstream": false, "flags": 0, "seq": 401, "time": "2024-06-14T15:48:01.506191192Z" }, "data": { "server_id": "NBRW5CG3HCOJSGMMXDGDGHPHTFUDLEYFUP3PXFWLNVFZ4PXCVLCQXNCI", "server_name": "nats-cluster-v4-2", "now": "2024-06-14T15:48:01.506164282Z", "num_routes": 2, "routes": [ { "rid": 5, "remote_id": "NCS7TNLPVKAI7YA777FKARHZAEVIUOQ5CKFYXSKP6RDO2ZGE2TSFGE25", "remote_name": "nats-cluster-v4-1", "did_solicit": true, "is_configured": true, "ip": "10.202.142.65", "port": 6222, "start": "2024-06-14T12:46:40.645912846Z", "last_activity": "2024-06-14T15:48:01.506142912Z", "rtt": "302µs", "uptime": "3h1m20s", "idle": "0s", "pending_size": 0, "in_msgs": 882, "out_msgs": 53161, "in_bytes": 335060, "out_bytes": 75416839, "subscriptions": 6132, "compression": "off" }, { "rid": 6, "remote_id": "NBYFQSZJA3Q42WRHRMDH6SCBQ3YJKBZZXH62P4BDZW73PCMFXZCQVKKS", "remote_name": "nats-cluster-v4-0", "did_solicit": true, "is_configured": true, "ip": "10.202.173.11", "port": 6222, "start": "2024-06-14T12:46:40.646706926Z", "last_activity": "2024-06-14T15:48:01.485658073Z", "rtt": "301µs", "uptime": "3h1m20s", "idle": "0s", "pending_size": 0, "in_msgs": 1127420, "out_msgs": 582011, "in_bytes": 3392144265, "out_bytes": 246896380, "subscriptions": 1502, "compression": "off" } ] } } ```

Only node restart did the trick. Situation after restart (0 connections, ~0 incoming bytes, ~0 outgoing bytes):

CleanShot 2024-06-14 at 17 25 50@2x

Expected behavior

Remove the subscription route when there are no consumers to read messages.

Server and client version

Server version: 2.10.14 Client: -

Host environment

GKE host:

OS: Container-Optimized OS from Google OS version: 109 Architecture: x86-64 CR: containerd

Steps to reproduce

In our case a few slow consumers had appeared in a cluster and incoming messages ran to the free node.

In logs we could see only messages like:

Logs ```js nats [7] 2024/06/10 11:05:42.004110 [INF] 10.202.171.13:52058 - cid:3207 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 4435 total bytes. nats [7] 2024/06/10 11:11:03.923105 [INF] 10.202.171.13:55020 - cid:3211 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 535 total bytes. nats [7] 2024/06/10 11:11:29.156916 [INF] 10.202.171.13:47712 - cid:3212 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 1 chunks of 176 total bytes. nats [7] 2024/06/10 11:12:13.039384 [INF] 10.202.171.13:36508 - cid:3214 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 44 chunks of 601458 total bytes. nats [7] 2024/06/10 11:13:49.470739 [INF] 10.202.171.13:36666 - cid:3216 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 14 chunks of 258264 total bytes. nats [7] 2024/06/10 11:26:42.056093 [INF] 10.202.171.13:33510 - cid:3219 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 848 total bytes. nats [7] 2024/06/10 11:31:57.516530 [INF] 10.202.171.13:55710 - cid:3221 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 1 chunks of 69 total bytes. nats [7] 2024/06/10 11:33:56.144849 [INF] 10.202.171.13:58652 - cid:3223 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 4319 total bytes. nats [7] 2024/06/10 11:37:48.260718 [INF] 10.202.171.13:59468 - cid:3227 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 30 chunks of 789327 total bytes. nats [7] 2024/06/10 11:38:29.813781 [INF] 10.202.171.13:40308 - cid:3229 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 1659 total bytes. nats [7] 2024/06/10 15:39:38.161377 [INF] 10.202.171.13:55896 - cid:3330 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 35 chunks of 929026 total bytes. nats [7] 2024/06/10 15:41:54.142104 [INF] 10.202.171.13:48106 - cid:3335 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 8820 total bytes. nats [7] 2024/06/10 17:08:54.921921 [INF] 10.202.171.13:37126 - cid:3409 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 61 chunks of 1927533 total bytes. nats [7] 2024/06/10 17:16:09.376586 [INF] 10.202.171.13:39610 - cid:3415 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 5772 total bytes. nats [7] 2024/06/10 17:18:45.193519 [INF] 10.202.171.13:56224 - cid:3416 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 2065 total bytes. nats [7] 2024/06/10 17:28:55.946395 [INF] 10.202.171.13:53416 - cid:3417 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 13288 total bytes. nats [7] 2024/06/10 17:38:25.867114 [INF] 10.202.171.13:48406 - cid:3422 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 16039 total bytes. nats [7] 2024/06/10 17:38:41.448495 [INF] 10.202.171.13:48482 - cid:3423 - Slow Consumer Detected: WriteDeadline of 4s exceeded with 2 chunks of 18763 total bytes ```
jing-flowdesk commented 2 weeks ago

Hello, nice timing, it seems we are having similar issue with similar setup. We also have a NATS cluster with 3 nodes in different AZ and connecting clients also to the node in the same AZ.

Server and client version Server version: 2.10.16 Client: -

No Jetstream activated We have setup queue group for subscribers

Observed behavior When there is no subscriber for a specific subject in a queue group. We observe that we have 10 times more messages (sent and received) according to our metrics

Expected behavior No burst of messages when there is no subscriber

nats message
derekcollison commented 2 weeks ago

Can you provide us detailed instructions on how we could reproduce that would be helpful.

miloaec commented 1 week ago

Hi, I am working with Jing and we restarted the nats servers and at the same time we moved them (pods) to a new nodes pool, so the nats pods migrate to new vm. Since this move we did not see any issue.

We tried to reproduce the same behavior on our dev environment , alas we are actually unable to reproduce it.

When it occurred, we had a microservice sending messages to a subscriber, this subscriber is part of a queue group. If we stop the subscriber, we observe the behavior we can see in the graph sent by Jing, the number of messages is about x10.

When we subscribe back the number of message come back to a normal number. We test it several time with exactly the same behavior, alas as i said, since we move to new nodes we are not able to reproduce this behavior.

nenych commented 1 week ago

@derekcollison I can't reproduce it right now, but we have a node with this issue and if there is some possibility to debug it there - can do it.

derekcollison commented 1 week ago

So you have a node that is showing this behavior that has no client connections and no jetstream assets on that node, correct?

nenych commented 1 week ago

Right now it has clients but also for sure has additional unroutable traffic (if we fix one issue we will have this node without clients). We do not use JS so yes this node has no jetstream.

derekcollison commented 1 week ago

Possible to see if issue presents with latest pre-release candidate for v2.10.17? RC6?

https://github.com/nats-io/nats-server/releases/tag/v2.10.17-RC.6

nenych commented 1 week ago

I am trying to reproduce this situation but without luck, so we have it only on production.

derekcollison commented 1 week ago

Is production showing the issue now?

nenych commented 1 week ago

Yes, it is production, and the issue is present.

derekcollison commented 1 week ago

Can we schedule a call to take a look?

Shoot me an email - derek@synadia.com