nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16k stars 1.41k forks source link

NATS server restarts after a panic #6048

Closed speier closed 3 weeks ago

speier commented 3 weeks ago

Observed behavior

We have a small cluster of 3 nodes hub, with one leaf node connected, deployed to Kubernetes, and recently one of the cluster members has been restarting many times, throwing this panic:

[762] 2024/10/28 17:16:04.266396 [INF] Starting nats-server
[762] 2024/10/28 17:16:04.267248 [INF]   Version:  2.10.22
[762] 2024/10/28 17:16:04.267254 [INF]   Git:      [240e9a4]
[762] 2024/10/28 17:16:04.267258 [INF]   Cluster:  nats
[762] 2024/10/28 17:16:04.267261 [INF]   Name:     nats-1
[762] 2024/10/28 17:16:04.267267 [INF]   Node:     yrzKKRBu
[762] 2024/10/28 17:16:04.267270 [INF]   ID:       NCHVTW7JN3R6PND2R2CBOM76IFG3PTN7F4CUVIS2LMNMZQ5E3WQ2HL3R
[762] 2024/10/28 17:16:04.267312 [INF] Using configuration file: /etc/nats-config/nats.conf
[762] 2024/10/28 17:16:04.269297 [INF] Starting http monitor on 0.0.0.0:8222
[762] 2024/10/28 17:16:04.269772 [INF] Starting JetStream
[762] 2024/10/28 17:16:04.270595 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[762] 2024/10/28 17:16:04.270603 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[762] 2024/10/28 17:16:04.270606 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[762] 2024/10/28 17:16:04.270609 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[762] 2024/10/28 17:16:04.270612 [INF]
[762] 2024/10/28 17:16:04.270615 [INF]          https://docs.nats.io/jetstream
[762] 2024/10/28 17:16:04.270618 [INF]
[762] 2024/10/28 17:16:04.270653 [INF] ---------------- JETSTREAM ----------------
[762] 2024/10/28 17:16:04.270700 [INF]   Max Memory:      0 B
[762] 2024/10/28 17:16:04.270761 [INF]   Max Storage:     100.00 GB
[762] 2024/10/28 17:16:04.270793 [INF]   Store Directory: "/data/jetstream"
[762] 2024/10/28 17:16:04.270844 [INF]   Domain:          hub
[762] 2024/10/28 17:16:04.270879 [INF] -------------------------------------------
[762] 2024/10/28 17:16:04.280885 [INF] Starting JetStream cluster
[762] 2024/10/28 17:16:04.280891 [INF] Creating JetStream metadata controller
[762] 2024/10/28 17:16:04.281604 [INF] JetStream cluster recovering state
[762] 2024/10/28 17:16:04.283689 [INF] Listening for leafnode connections on 0.0.0.0:7422
[762] 2024/10/28 17:16:04.285627 [INF] Listening for client connections on 0.0.0.0:4222
[762] 2024/10/28 17:16:04.285883 [INF] Server is ready
[762] 2024/10/28 17:16:04.286279 [INF] Cluster name is nats
[762] 2024/10/28 17:16:04.286346 [INF] Listening for route connections on 0.0.0.0:6222
[762] 2024/10/28 17:16:04.297379 [INF] 10.0.64.206:6222 - rid:13 - Route connection created
[762] 2024/10/28 17:16:04.297521 [INF] 10.0.64.19:6222 - rid:16 - Route connection created
[762] 2024/10/28 17:16:04.297859 [INF] 10.0.64.19:6222 - rid:14 - Route connection created
[762] 2024/10/28 17:16:04.297927 [INF] 10.0.64.206:6222 - rid:15 - Route connection created
[762] 2024/10/28 17:16:04.348277 [INF] 10.0.64.19:6222 - rid:17 - Route connection created
[762] 2024/10/28 17:16:04.354068 [INF] 10.0.64.19:6222 - rid:18 - Route connection created
[762] 2024/10/28 17:16:04.383297 [INF] 10.0.64.206:6222 - rid:19 - Route connection created
[762] 2024/10/28 17:16:04.386095 [WRN] Waiting for routing to be established...
[762] 2024/10/28 17:16:04.386274 [WRN] JetStream has not established contact with a meta leader
[762] 2024/10/28 17:16:04.433731 [INF] 10.0.64.206:6222 - rid:20 - Route connection created
[762] 2024/10/28 17:16:05.253739 [INF] JetStream cluster new metadata leader: nats-2/nats
panic: runtime error: index out of range [0] with length 0

goroutine 36 [running]:
github.com/nats-io/nats-server/v2/server/stree.(*SubjectTree[...]).match(0xc00058c8f8?, {0xcde2c8?, 0xc0003502a0?}, {0xc00039a600?, 0x10?, 0x10?}, {0xc000121500, 0x0, 0x100}, 0xc000463bc8?)
    github.com/nats-io/nats-server/v2/server/stree/stree.go:286 +0x945
github.com/nats-io/nats-server/v2/server/stree.(*SubjectTree[...]).Match(0xce33a0, {0xc0000273c0, 0x1b, 0x1b}, 0xc000463c60)
    github.com/nats-io/nats-server/v2/server/stree/stree.go:124 +0x11b
github.com/nats-io/nats-server/v2/server.(*fileStore).loadLast(0xc0001cee08, {0xc0000273c0, 0x1b}, 0xc0002564d0)
    github.com/nats-io/nats-server/v2/server/filestore.go:6261 +0x30e
github.com/nats-io/nats-server/v2/server.(*fileStore).LoadLastMsg(0xc0003a00f9?, {0xc0000273c0?, 0xc?}, 0xc0002564d0)
    github.com/nats-io/nats-server/v2/server/filestore.go:6329 +0x53
github.com/nats-io/nats-server/v2/server.(*Server).jsMsgGetRequest(0xc0000f3b08, 0x0?, 0xc000203140?, 0xc000463e68?, {0xc000028540, 0x23}, {0xc0000e47e0, 0x11}, {0xc0003a0000, 0x127, ...})
    github.com/nats-io/nats-server/v2/server/jetstream_api.go:3153 +0xcd2
github.com/nats-io/nats-server/v2/server.(*Server).processJSAPIRoutedRequests(0xc0000f3b08)
    github.com/nats-io/nats-server/v2/server/jetstream_api.go:871 +0x36b
github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine.func1()
    github.com/nats-io/nats-server/v2/server/server.go:3794 +0x32
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine in goroutine 1
    github.com/nats-io/nats-server/v2/server/server.go:3792 +0x145

Expected behavior

Did not expected to restart with a panic.

Server and client version

nats-server v2.10.22

Host environment

Kubernetes, GKE AutoPilot

Client Version: v1.28.2 Server Version: v1.31.1-gke.1678000

Steps to reproduce

No response

neilalexander commented 3 weeks ago

Thanks for the report, have posted a fix PR.