Closed tvojacek closed 1 year ago
@TomasVojacek Thank you for the report. I see this worrisome trace in log1.log:
[45] 2022/10/03 19:57:00.185262 [WRN] Waiting for routing to be established...
[45] 2022/10/03 19:57:00.420945 [INF] 172.30.19.253:43932 - rid:15 - Route connection created
[45] 2022/10/03 19:57:00.421298 [INF] 172.30.19.253:43932 - rid:15 - Router connection closed: Duplicate Route
[45] 2022/10/03 19:57:00.429257 [INF] 172.30.41.131:37134 - rid:16 - Route connection created
[45] 2022/10/03 19:57:00.429613 [INF] 172.30.41.131:37134 - rid:16 - Router connection closed: Duplicate Route
[45] 2022/10/03 19:59:00.751026 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[45] 2022/10/03 19:59:00.751542 [WRN] 172.30.41.131:6222 - rid:14 - Readloop processing time: 2m0.329090184s
[45] 2022/10/03 19:59:00.751787 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[45] 2022/10/03 19:59:00.752023 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
[45] 2022/10/03 19:59:00.752243 [WRN] Healthcheck failed: "JetStream has not established contact with a meta leader"
Notice the route saying that it was stuck for more than 2 minutes! That could explain why the messages accumulate, so I wonder if there is not some kind of blocking situation, maybe not deadlock per-se but something wrong for sure. If you can reproduce and notice a server starting to build-up memory, the same way you collected nats1.heap, could you run the profile to collect CPU stats for few seconds, or even better, hit the "/stacksz" endpoint few times at several seconds intervals? That would help us a lot. Thanks!
I failed to get cluster to working state.
Data on disk are causing some problem. Sending logs from this state. (May be data could have been corrupted by downgrade from 2.9.1 to 2.9.0)
All Nodes restart periodically
Node1 get completely separated from cluster
nats sub ">"
on node0 and node2 return data as expected
on node1:
nats sub ">"
does nothing
nats str report
nats: error: setup failed: read tcp 127.0.0.1:56594->127.0.0.1:4223: i/o timeout
log from node1
[3586] 2022/10/04 10:35:26.653669 [INF] ---------------- JETSTREAM ----------------
[3586] 2022/10/04 10:35:26.653678 [INF] Max Memory: 1.00 GB
[3586] 2022/10/04 10:35:26.653682 [INF] Max Storage: 100.00 GB
[3586] 2022/10/04 10:35:26.653685 [INF] Store Directory: "/data/jetstream"
[3586] 2022/10/04 10:35:26.653687 [INF] Domain: smarttx_cloud
[3586] 2022/10/04 10:35:26.653689 [INF] -------------------------------------------
[3586] 2022/10/04 10:35:26.657266 [INF] Starting restore for stream '$G > STX_SERVER_DATA'
[3586] 2022/10/04 10:35:26.819008 [INF] Restored 331,972 messages for stream '$G > STX_SERVER_DATA'
[3586] 2022/10/04 10:35:26.819371 [INF] Recovering 2 consumers for stream - '$G > STX_SERVER_DATA'
[3586] 2022/10/04 10:35:26.825770 [INF] Starting JetStream cluster
[3586] 2022/10/04 10:35:26.825794 [INF] Creating JetStream metadata controller
[3586] 2022/10/04 10:35:26.827664 [INF] JetStream cluster recovering state
[3586] 2022/10/04 10:35:26.828924 [INF] Listening for leafnode connections on 0.0.0.0:7422
[3586] 2022/10/04 10:35:26.829459 [INF] Listening for client connections on 0.0.0.0:4222
[3586] 2022/10/04 10:35:26.829601 [INF] TLS required for client connections
[3586] 2022/10/04 10:35:26.829835 [INF] Server is ready
[3586] 2022/10/04 10:35:26.830500 [INF] Cluster name is nats
[3586] 2022/10/04 10:35:26.830549 [INF] Listening for route connections on 0.0.0.0:6222
[3586] 2022/10/04 10:35:26.835867 [INF] 172.30.41.131:6222 - rid:13 - Route connection created
[3586] 2022/10/04 10:35:26.836947 [INF] 172.30.19.253:6222 - rid:14 - Route connection created
[3586] 2022/10/04 10:35:26.930133 [WRN] Waiting for routing to be established...
[3586] 2022/10/04 10:35:27.481113 [INF] 172.30.19.253:46214 - rid:15 - Route connection created
[3586] 2022/10/04 10:35:27.481449 [INF] 172.30.19.253:46214 - rid:15 - Router connection closed: Duplicate Route
[3586] 2022/10/04 10:35:27.504002 [INF] 172.30.41.131:42562 - rid:16 - Route connection created
[3586] 2022/10/04 10:35:27.504294 [INF] 172.30.41.131:42562 - rid:16 - Router connection closed: Duplicate Route
node0.restart_it_self.log node1.1328.log log with trace enabled profile info: stuck_state_profile.tar.gz
Could you share the config files for the servers in the cluster?
I have deleted /data/jetstream Restarted cluster Created jetstreams (config are in firts post)
nats --context=prod stream edit STX_SERVER_DATA --config=raw-data-prod.json
Nats start as expected, for while it looked ok. Data were flowing. After while nats0 and nats2 start rising memory. Unit nats0 OOM at 12G. For unkown reason to me some leafnodes did not deliver data (jetstream report huge lag) All 12 leafnodes are right now test device simulators, running on same AWS machine, with shared config, only subjects are changing. Only exception is that on two leafnodes has jetstream max_bytes 20GiB while, rest 10 has only 5GB. Will sent logs and traces and stack in next post.
Before OOM crash on node0, node0 stopped respond to nats str info
.
consumer_report_nats0_04T174132.log
Status of jetstream are from node1 that keep responding. stx_info_04T174141.log consumer_report_nats1_04T173329.log
The server_name being POD_NAME, does that changes each time?
What does nats server ls
report from the system account? How about nats server report jetstream
?
And finally nats traffic
, all from a system account user?
Profile info: profile.tar.gz logs.tar.gz
As can be seen in nats.conf. I do not have system account. It will take me while to change helmchart generated config or is there flag to create SYS account?
There is always a system account, you need to assign a user to it.
if you are server config, you can add this..
# For access to system account.
accounts { $SYS { users = [ { user: "admin", pass: "s3cr3t!" } ] } }
can I combine accounts{} with authorize{}?
Looking at some of the report, what I see from node1_3.stacksz is that this node is been busy recovering some state presumably for a stream:
goroutine 37 [runnable]:
encoding/binary.littleEndian.Uint32(...)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/encoding/binary/binary.go:81
github.com/nats-io/nats-server/v2/server.(*msgBlock).indexCacheBuf(0xc000a984e0, {0xc0022fe000, 0x3fffef, 0x400000?})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3439 +0x2d2
github.com/nats-io/nats-server/v2/server.(*msgBlock).loadMsgsWithLock(0xc000a984e0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3721 +0x425
github.com/nats-io/nats-server/v2/server.(*msgBlock).fetchMsg(0xc000a984e0, 0x12cb0c7, 0x48?)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3751 +0xe5
github.com/nats-io/nats-server/v2/server.(*fileStore).msgForSeq(0xc0000f6a00, 0xa5dc20?, 0xc001dd4001?)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3899 +0x138
github.com/nats-io/nats-server/v2/server.(*fileStore).LoadMsg(0xc0000f6a00?, 0xc001dc4c00?, 0x1af3?)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:4013 +0x19
github.com/nats-io/nats-server/v2/server.(*raft).loadEntry(0xc000230600, 0xc001dd2480?)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/raft.go:2329 +0x4b
github.com/nats-io/nats-server/v2/server.(*Server).startRaftNode(0xc000138000, {0xa9d31e, 0x2}, 0xc000346e08)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/raft.go:446 +0xeda
github.com/nats-io/nats-server/v2/server.(*jetStream).createRaftGroup(0xc00011e540, {0xa9d31e, 0x2}, 0xc0003e4960, 0x16)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1629 +0x845
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateStream(0xc00011e540, 0xc00009ab40, 0xc000174fc0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2933 +0x10f
github.com/nats-io/nats-server/v2/server.(*jetStream).processStreamAssignment(0xc00011e540, 0xc000174fc0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:2696 +0x457
github.com/nats-io/nats-server/v2/server.(*jetStream).applyMetaSnapshot(0xc00011e540, {0xc000172046, 0x496, 0x49f})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1203 +0x75a
github.com/nats-io/nats-server/v2/server.(*jetStream).applyMetaEntries(0xc00011e540, {0xc000012150, 0x1, 0x0?}, 0xc000347a18)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:1439 +0xad5
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster(0xc00011e540)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/jetstream_cluster.go:971 +0xb68
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3077 +0x85
this is done under the "js" lock that blocks all other go routines trying to get that lock, including from the routes:
goroutine 32 [semacquire, 4 minutes]:
sync.runtime_SemacquireMutex(0xc000138038?, 0x68?, 0xc00034ac28?)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/sync/rwmutex.go:71
github.com/nats-io/nats-server/v2/server.(*Server).sendStatsz(0xc000138000, {0xc000418140, 0x4b})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:691 +0x39e
github.com/nats-io/nats-server/v2/server.(*Server).processNewServer(0xc000138000, 0xc00034b0f8)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1298 +0x287
github.com/nats-io/nats-server/v2/server.(*Server).updateRemoteServer(0xc000138000, 0xc00034b0f8)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1270 +0x1d6
github.com/nats-io/nats-server/v2/server.(*Server).remoteServerUpdate(0xc000138000, 0xc0001c3200?, 0xc000379980, 0x4b?, {0x0?, 0x800?}, {0x0?, 0x0?}, {0xc00017905c, 0x45c, ...})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:1257 +0x396
github.com/nats-io/nats-server/v2/server.(*client).deliverMsg(0xc000379980, 0x0, 0xc000210540, 0x4fa?, {0xc00017900a, 0x4b, 0xff6}, {0x0, 0x0, 0x0}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3193 +0xb03
github.com/nats-io/nats-server/v2/server.(*client).processMsgResults(0xc000379980, 0xc00009ad80, 0xc00058c5a0, {0xc00017905c, 0x45e, 0xfa4}, {0x0, 0x0, 0x1?}, {0xc00017900a, ...}, ...)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:4227 +0xb10
github.com/nats-io/nats-server/v2/server.(*client).processInboundRoutedMsg(0xc000379980, {0xc00017905c, 0x45e, 0xfa4})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:443 +0x159
github.com/nats-io/nats-server/v2/server.(*client).processInboundMsg(0xc000379980?, {0xc00017905c?, 0x45e?, 0xfa4?})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:3505 +0x36
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc000379980, {0xc000179000, 0x4ba, 0x1000})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/parser.go:497 +0x210a
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc000379980, {0x0, 0x0, 0x0})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/client.go:1238 +0xf36
github.com/nats-io/nats-server/v2/server.(*Server).createRoute.func1()
/home/travis/gopath/src/github.com/nats-io/nats-server/server/route.go:1372 +0x25
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
/home/travis/gopath/src/github.com/nats-io/nats-server/server/server.go:3077 +0x85
This shows that this has been blocked for 4+ minutes. All the /healthz are stuck too (there are several at different time, this one shows being blocked for 4 minutes too):
goroutine 90 [semacquire, 4 minutes]:
sync.runtime_SemacquireMutex(0xc000138000?, 0x40?, 0x0?)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/sync/rwmutex.go:71
github.com/nats-io/nats-server/v2/server.(*Server).healthz(0xb9dfd0?, 0xc0001b29c6)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/monitor.go:3022 +0x1c5
github.com/nats-io/nats-server/v2/server.(*Server).HandleHealthz(0xc000138000, {0xb9dfd0, 0xc00011e0e0}, 0x4de5e9?)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/monitor.go:2978 +0x114
net/http.HandlerFunc.ServeHTTP(0xc0001b2af0?, {0xb9dfd0?, 0xc00011e0e0?}, 0x0?)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0x0?, {0xb9dfd0, 0xc00011e0e0}, 0xc000f05300)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0xc000cce150?}, {0xb9dfd0, 0xc00011e0e0}, 0xc000f05300)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0000ec0a0, {0xb9e878, 0xc00019e900})
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/net/http/server.go:3102 +0x4db
So of course we need to have the route not blocked by that, but it seems that this server is taking quite a bit of time to recover things from disk. Will continue to investigate more.
@TomasVojacek I am tracking down the route being blocked and am going to see if I can get a PR today to try address that. Assuming that I get a fix, will you be able to run a nightly docker image to verify that it helps in your environment? If not, there was a plan for a 2.9.3 release soon, so that would go into that one anyway.
@kozlovic thank you looking forward nightly docker image.
@TomasVojacek @derekcollison There are MANY places in the server where a routed message will be processed by a function that may try to get the "Jetstream" lock. So not sure if we will be able to really prevent that from happening. So I think that the bigger question here is why is it taking more than 4 minutes (and maybe more) to be in that same routine:
goroutine 37 [runnable]:
encoding/binary.littleEndian.Uint32(...)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/encoding/binary/binary.go:81
github.com/nats-io/nats-server/v2/server.(*msgBlock).indexCacheBuf(0xc000a984e0, {0xc0022fe000, 0x3fffef, 0x400000?})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3439 +0x2d2
github.com/nats-io/nats-server/v2/server.(*msgBlock).loadMsgsWithLock(0xc000a984e0)
/home/travis/gopath/src/github.com/nats-io/nats-server/server/filestore.go:3721 +0x425
...
under the Jetstream lock. It looks like there may be some issues accessing the storage (meaning it takes way to long to do those I/O operations)?
nats run in aws kubernetes EKS on r5a.large (2 CPU cores 16GB, instance data are stored persistent volumes backed by aws ebs gp2 volume 100GB. It is ssd with guaranteed 3000 iops. I'm not aware of problems with files system. But nats is most busy part of kubernetes. I can try run it on aws ec2 machine and check if results are the same. I started using nats 2.8.4 with 2 leafnodes, nats cluster 3 docker containers and it OOM similar way as in aws EKS. But I was limited by 16GB total memory of my notebook, it OOM more frequently. Any other suggestion I can try?
@TomasVojacek The processing of a stream assignment is recovering from disk under the jetstream lock that is then tried to be acquired while processing some protocol from the route:
goroutine 32 [semacquire, 4 minutes]:
sync.runtime_SemacquireMutex(0xc000138038?, 0x68?, 0xc00034ac28?)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
/home/travis/.gimme/versions/go1.19.1.linux.amd64/src/sync/rwmutex.go:71
github.com/nats-io/nats-server/v2/server.(*Server).sendStatsz(0xc000138000, {0xc000418140, 0x4b})
/home/travis/gopath/src/github.com/nats-io/nats-server/server/events.go:691 +0x39e
github.com/nats-io/nats-server/v2/server.(*Server).processNewServer(0xc000138000, 0xc00034b0f8)
...
This is blocking for 4 minutes in the example above, which means that it causes the route to not deliver any new message from the other servers, etc.. I am still trying to see how this can be improved, so at this point no action in your part, except maybe trying with different environment. If you do so, please use v2.9.2 (the latest).
Merged the PR https://github.com/nats-io/nats-server/pull/3519 that should avoid blocking the other go routines (especially when coming from routes) in PR https://github.com/nats-io/nats-server/pull/3519. I have kicked the production of the nightly so that you can give it a try and see if that resolves your issue. Pull the synadia/nats-server:nightly and make sure you get this:
[1] 2022/10/05 13:20:12.884436 [INF] Starting nats-server
[1] 2022/10/05 13:20:12.884639 [INF] Version: 2.9.3-beta.1
[1] 2022/10/05 13:20:12.884642 [INF] Git: [37ca9722]
I have tested 2.9.3-beta.1. nats now OOM much faster than before.
We have checked that during first event one of aws drive shows higher latencies. I was wrong our drive is gp2 with 300 iops. I thought that it is 3000. End of chart shows situation then 2.9.3-beta-1 try to recover.
@TomasVojacek Like I said, this would not solve the underlying problem is that the disk is too slow. Now without blocking the route, there is no "artificial" back pressure and data will accumulate even faster. I will look at the attached data, but unless you make your disk faster, you will always have problems.
What I do not understand is when stream report 209MiB data, we need 10GB of RAM to cache them 97.30% 97.30% 9.40GB 97.30% github.com/nats-io/nats-server/v2/server.(*msgBlock).msgFromBuf.
STX_SERVER_DATA │ File │ │ 2 │ 72,619 │ 209 MiB │ 0 │ 640131 │ smart-tx-production-nats-0!, smart-tx-production-nats-1!, smart-tx-production-nats-2!
I will try to get faster disks.
What I see from this new set of stacks is that some of the nodes are waiting on a stream's lock to unsubscribe when processing the response of setting a consumer of the source (from all your leafnode connections), but the stream lock is held when processing a leader change while that requires the RAFT's lock, which is again held while recovering messages from its log.
From leafnode connections:
sync.(*RWMutex).Lock(0x0?)
/opt/hostedtoolcache/go/1.19.1/x64/src/sync/rwmutex.go:147 +0x36
github.com/nats-io/nats-server/v2/server.(*stream).unsubscribeUnlocked(0xc0001c4000, 0xc0003c6a80)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/stream.go:3213 +0x25
github.com/nats-io/nats-server/v2/server.(*stream).setSourceConsumer.func1(0xc000331228?, 0xc0011c1300, 0x9e47c0?, {0x0?, 0x0?}, {0xc000331268?, 0x41353f?}, {0xc042964063, 0x2d8, 0x3f9d})
This is the stream lock that is trying to be acquired, but held by this:
goroutine 550 [semacquire, 1 minutes]:
sync.runtime_SemacquireMutex(0xc0001c4000?, 0x0?, 0xc00002a4f8?)
/opt/hostedtoolcache/go/1.19.1/x64/src/runtime/sema.go:77 +0x25
sync.(*RWMutex).RLock(...)
/opt/hostedtoolcache/go/1.19.1/x64/src/sync/rwmutex.go:71
github.com/nats-io/nats-server/v2/server.(*raft).GroupLeader(0xc0001c4600?)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:1213 +0x5e
github.com/nats-io/nats-server/v2/server.(*stream).setLeader(0xc0001c4000, 0x1)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/stream.go:649 +0x13f
github.com/nats-io/nats-server/v2/server.(*jetStream).processStreamLeaderChange(0xc000154540, 0xc0001c4000, 0x1)
while itself try to get the lock from raft group that is busy recovering:
goroutine 548 [runnable]:
syscall.Syscall(0x4a4467?, 0xc000e9d4d8?, 0x800000?, 0x7ffff800000?)
/opt/hostedtoolcache/go/1.19.1/x64/src/syscall/syscall_linux.go:68 +0x27
syscall.read(0xc0009267e0?, {0xc0321f4000?, 0x0?, 0x0?})
/opt/hostedtoolcache/go/1.19.1/x64/src/syscall/zsyscall_linux_amd64.go:696 +0x45
syscall.Read(...)
/opt/hostedtoolcache/go/1.19.1/x64/src/syscall/syscall_unix.go:183
internal/poll.ignoringEINTRIO(...)
/opt/hostedtoolcache/go/1.19.1/x64/src/internal/poll/fd_unix.go:794
internal/poll.(*FD).Read(0xc0009267e0?, {0xc0321f4000?, 0x3fecac?, 0x400000?})
/opt/hostedtoolcache/go/1.19.1/x64/src/internal/poll/fd_unix.go:163 +0x285
os.(*File).read(...)
/opt/hostedtoolcache/go/1.19.1/x64/src/os/file_posix.go:31
os.(*File).Read(0xc11a4ca128, {0xc0321f4000?, 0x7d343c?, 0xee1d40?})
/opt/hostedtoolcache/go/1.19.1/x64/src/os/file.go:118 +0x5e
io.ReadAtLeast({0xb9b6a0, 0xc11a4ca128}, {0xc0321f4000, 0x3fecac, 0x400000}, 0x3fecac)
/opt/hostedtoolcache/go/1.19.1/x64/src/io/io.go:332 +0x9a
io.ReadFull(...)
/opt/hostedtoolcache/go/1.19.1/x64/src/io/io.go:351
github.com/nats-io/nats-server/v2/server.(*msgBlock).loadBlock(0x0?, {0x0, 0xc15b55ae00?, 0x0})
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:3656 +0x15d
github.com/nats-io/nats-server/v2/server.(*msgBlock).loadMsgsWithLock(0xc000846ea0)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:3702 +0x2e5
github.com/nats-io/nats-server/v2/server.(*msgBlock).fetchMsg(0xc000846ea0, 0x602595, 0x48?)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:3751 +0xe5
github.com/nats-io/nats-server/v2/server.(*fileStore).msgForSeq(0xc000138780, 0xa5dc20?, 0x7f71e4251101?)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:3899 +0x138
github.com/nats-io/nats-server/v2/server.(*fileStore).LoadMsg(0x2030a3?, 0x88?, 0xc000e9d8e8?)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/filestore.go:4013 +0x19
github.com/nats-io/nats-server/v2/server.(*raft).loadEntry(0xc0001c4600, 0xc0016fc810?)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:2329 +0x4b
github.com/nats-io/nats-server/v2/server.(*raft).applyCommit(0xc0001c4600, 0x602595)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:2363 +0x125
github.com/nats-io/nats-server/v2/server.(*raft).trackResponse(0xc0001c4600, 0xc000266640)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:2486 +0x1c5
github.com/nats-io/nats-server/v2/server.(*raft).processAppendEntryResponse(0xc0001c4600, 0xc000266640)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:3028 +0x1da
github.com/nats-io/nats-server/v2/server.(*raft).runAsLeader(0xc0001c4600)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:2026 +0xb4f
github.com/nats-io/nats-server/v2/server.(*raft).run(0xc0001c4600)
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/raft.go:1631 +0x2ba
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
/home/runner/work/nats-server/src/github.com/nats-io/nats-server/server/server.go:3077 +0x85
I think that your cluster is very unstable and wonder if you not have some looping going on, or it is simply that the disk is WAY too slow.
kubernetes cluster with excpetion of nats does not show stability problems. What type of disk use ngs on aws nodes?
@TomasVojacek From our ops team:
300 I/O requests per second seems light for something hosting general workloads in K8S where a data-heavy jetstream volume is part of it.
I'd make sure they are using a Persistent Volume Claim for the JS store, provisioned from a pool where they control the characteristics, and adjust the performance of the PV as needed. Nothing less than 1000 IOPS if they're doing any sort of real volume. Just ls -l is going to be quite a few I/O operations.
On NGS, IOPs have been measured at more than 10,000...
Do we want to keep this issues open, add warning to docs and close it? Does mean that nats jetstreams are not supported on hardware with magnetic disks?
We could certainly add a note in the documentation. As for "Does mean that nats jetstreams are not supported on hardware with magnetic disks?": no, it doesn't mean that, but they need to have enough speed. I don't think it is the disk performance itself, but likely some imposed limits by your environment you are running on.
Have you tried with better IOPs? Are you still experiencing the issue?
AWS gp2 (volume 100GB) (disk latency avg 3ms max 5ms) AWS gp3 (volume 100GB) 3000 iops similar to gp2 (disk latency 3ms) AWS gp3 (volume 100GB) 6000 iops better (disk latency avg 1,5ms) not sure if it resolve problem but price of disk is too high. I was not able to test disk with higher iops on friday. But it will be even more expensive. Conclusion is that I did not find in AWS portfolio suitable disk. I will try to use memory jetstream as alternative solution.
I think something else is going on, although IOPS does come into play, JetStream works with the kernel to be efficient here, so I don't think that is the direct issue.
@derekcollison Please have a look at the investigation of the last data provided by @TomasVojacek. Unless you think otherwise, I do believe that the disk is likely the root of the issues: https://github.com/nats-io/nats-server/issues/3517#issuecomment-1268662491
Would need to have access to the system directly to poke a bit, but not convinced just slow disk.
@derekcollison if you are interested i can allow you access to cluster, trough my notebook. I'm on nats slack [tvojacek@amp.energy]
Like @kozlovic said we want to understand if it was a true short write or other corruption. This is the first case we have seen that tripped the intentional panic inside the server. If the system has a > R1 replication factor we can recover by doing a reset.
Would it be possible to send us the storage directory (just the meta/js) securely?
@derekcollison I think your comment is more for the panic on decodeConsumerState(), which is in issue https://github.com/nats-io/nats-server/issues/3535
Yes apologies that is correct.
Will close the issue but feel free to reopen if needed. If this still presents with a current server, 2.9.17 or above let us know.
Defect
Make sure that these boxes are checked before submitting your issue -- thank you!
nats-server -DV
output [142] 2022/10/03 20:19:39.311095 [INF] Starting nats-server [142] 2022/10/03 20:19:39.311121 [INF] Version: 2.9.2 [142] 2022/10/03 20:19:39.311124 [INF] Git: [6d81dde] [142] 2022/10/03 20:19:39.311127 [DBG] Go build: go1.19.1 [142] 2022/10/03 20:19:39.311130 [INF] Name: NDEWXKWIMBLGMGDSTVGK4CA7RE2RX76SHFFQPRX3AMQK57KAUJ2COX5K [142] 2022/10/03 20:19:39.311135 [INF] ID: NDEWXKWIMBLGMGDSTVGK4CA7RE2RX76SHFFQPRX3AMQK57KAUJ2COX5K [142] 2022/10/03 20:19:39.311161 [DBG] Created system account: "$SYS"Versions of
nats-server
and affected client libraries used:2.8.4, 2.9.0, 2.9.2
OS/Container environment:
helm 0.18.0 + image 2.9.2-apline 3x aws 16GB instance, 12GB RAM limit
Steps or code to reproduce the issue:
Expected result:
cluster start downloading messages from leafnodes jetstreams, at network speed until jetstream max_bytes is hit (1GiB) both consumer consume messages as fast as they can (one is much slower than other one) consumers consume messages from jetstream. jetstream sources lag goes to zero
Actual result:
after while,
nats consumer report SERVER_DATA
repond with nats: error: JetStream system temporarily unavailable (10008), try --help
logs.zip