Closed aimichelle closed 8 months ago
Could it have run out of memory?
Could it have run out of memory?
Hmm, that's certainly a possibility. The memory pressure on these nodes is somewhat high. Let me try again after adding a node or two to the cluster.
increased resource limits (note that this is sizing for a dev cluster with low traffic, prod will be sized up significantly more)
container:
env:
GOMEMLIMIT: 6750MiB
merge:
resources:
requests:
cpu: 1
memory: 5Gi
limits:
cpu: 1.5
memory: 7.5Gi
still seeing plenty of issues even through there are no messages in any of the streams. no crashes anymore, but can get stream reports:
~ # nats stream report
Obtaining Stream stats
nats: error: context deadline exceeded
The streams are all R3 and file based. The cluster is R3. Any suggestions on sizing and/or whether this number of consumers is not recommended?
How many consumers are you trying to create?
Are they all inheriting R3 from the stream? Meaning they are HA assets?
How many consumers are you trying to create?
Are they all inheriting R3 from the stream? Meaning they are HA assets?
It's currently 4096 consumers per stream with 4 streams. So about 16k consumers. I believe all of them do inherit R3.
That is alot of HA assets for the system. Each one is complete NRG (RAFT) group underneath. Heartbeats alone would be ~16k msgs/sec not to mention the memory footprint.
We consult with our customers on how best to use the system to achieve their goals, might be something to consider.
Observed behavior
We have an application which uses the
nats.go/jetstream
client to create 4 streams. We are processing these streams with 4096 partitions, so created 4096 consumers for each stream to handle each partition. These consumers are also created via the client.While the consumers are being created, we got a panic on the nats-server. The panic log is pretty long, so the log we got from Kubernetes is truncated at the beginning. Log is attached here: https://drive.google.com/file/d/1kfrZwoe9HR2P1SJ-DumdgoyYmA9QTJy4/view?usp=sharing (It was too long for Github)
Please also let us know if this is a proper usage of jetstream.
Expected behavior
No panic, and all streams and consumers are created and functioning.
Server and client version
server: 2.10.5 nats.go: 1.31.0
Host environment
This was running in a GKE cluster: 1.28.3-gke.1118000 with n2d-standard-8 nodes running Ubuntu with containerd.
Steps to reproduce