Open kohlisid opened 3 months ago
Perhaps related to #5673 But we are setting stream limits of maxBytes and maxMessages
How much memory are you giving each NATS node?
@neilalexander
[55] 2024/08/02 01:00:45.462256 [INF] ---------------- JETSTREAM ----------------
[55] 2024/08/02 01:00:45.462259 [INF] Max Memory: 768.00 MB
[55] 2024/08/02 01:00:45.462261 [INF] Max Storage: 1.00 TB
I asked specifically because sometimes the reporting doesn't always match cgroup restrictions, do you OOM as quickly if you set the GOMEMLIMIT
environment variable to some floor below the available memory, i.e. GOMEMLIMIT=512MiB
?
If you can capture some memory profiles when the memory usage is up but before OOM then that would be useful, either nats server request profile allocs
from the system account or https://docs.nats.io/running-a-nats-service/nats_admin/profiling.
The Max Memory
shown does not limit how much memory NATS uses. It limits how much memory storage a stream may use and has no impact on NATS overall memory usage for its normal running needs.
What are you giving the process from your OS perspective?
I'm running the the cluster on K8s, with each JS server as a pod giving a memory of
memory: 1Gi
@ripienaar
Huge payload messages would consume more ram, ime I wouldn't run nats with below 3GB memory when using Jetstream. Your big messages will make matters worse.
But on the flipside how would we estimate how much memory resources to allocate in such a scenario? Is there some guideline. If there is a surge in message rate, we would need to ensure that it doesnt OOM @ripienaar
It varies a lot by use case, message rates etc and changes from version to version - usually the needs go down. A workload that today uses about 1GB memory for me used to use 6GB some time ago.
You should set up monitoring and use nats bench
to simulate various scenarios, there's some guidance on the website about memory usage but its definitely wrong so doing your own tests would be best.
@ripienaar I agree to trying to simulate the scenarios, but for a high load this memory was spiking up a lot. I cannot run my nodes without memory limit guardrails, hence I was trying to mimic the scenario here using limited resources to see when would it OOM. I assumed even for a worst case scenario that the whole stream is pulled into the cache (even though I'm using filestore storage) having a maxBytes of 100Mb for 2 streams and replication=3 (600Mb). But this ceiling is getting hit more often than not.
For high work loads giving unbounded memory isn't possible, I want to understand where and why would the server be using the excess memory, and thus how to decide the upper bounds in that case.
debug/pprof/allocs
mem_pprof.zip
@neilalexander
A good practice when limiting memory via cgroups and containers is to set the env GOMEMLIMIT to ~75% of actual limit. Sometimes the Go runtime only sees the host value of memory and does not feel pressure to GC and clean things up but the container and linux kernel will OOM it.
That could help in terms of the golang GC, but in terms of the nats what would be a good limit upper limit in the first place? How should we calculate that. Even when I consider the worst case where all streams can reside in memory, I saw the OOM as stated in the above issue For a filestore mode, we would not expect that to be the ideal case @derekcollison
For any program written in Go that is to be run in a container its best practice these days to set GOMEMLIMIT.
We do plan on introducing a high bound on how much buffer memory for all filestore backed streams in a server. This will come probably in 2.12.
I agree on the GOMEMLIMIT, will definitely go ahead on changing that. But as a practice at this point, can we estimate the upper bound memory for the container? Based on the TPS and message size etc. Or some experimental settings that can help to measure that @derekcollison
Memory usage is quite dynamic based on connections, number of subscriptions for core. And for JetStream, number of streams and consumers, access patterns etc..
Best way is to model the upper bounds of what you expect on a larger system and monitor RSS peaks..
On a recent exploration with Limits
policy, I was seeing the memory spiking up considerably again.
│ Stream Report │
├──────────────────────────────────────────────────────────────────────────────┬─────────┬───────────┬───────────┬──────────┬─────────┬──────┬─────────┬──────────────────────────────────────────────────────────────────┤
│ Stream │ Storage │ Placement │ Consumers │ Messages │ Bytes │ Lost │ Deleted │ Replicas │
├──────────────────────────────────────────────────────────────────────────────┼─────────┼───────────┼───────────┼──────────┼─────────┼──────┼─────────┼──────────────────────────────────────────────────────────────────┤
│ KV_xxxx-simple-pipeline-out_SINK_OT │ File │ │ 0 │ 0 │ 0 B │ 0 │ 0 │ isbsvc-default-js-1*, isbsvc-default-js-3, isbsvc-default-js-4! │
│ KV_xxxx-simple-pipeline_SIDE_INPUTS │ File │ │ 0 │ 0 │ 0 B │ 0 │ 0 │ isbsvc-default-js-1, isbsvc-default-js-2*, isbsvc-default-js-3 │
│ KV_xxxx-simple-pipeline-out_SINK_PROCESSORS │ File │ │ 0 │ 8 │ 1.1 KiB │ 0 │ 2002 │ isbsvc-default-js-1*, isbsvc-default-js-2, isbsvc-default-js-3 │
│ KV_xxxx-simple-pipeline-in-out_OT │ File │ │ 10 │ 10 │ 1.3 KiB │ 0 │ 9 │ isbsvc-default-js-0, isbsvc-default-js-1, isbsvc-default-js-3* │
│ KV_xxxx-simple-pipeline-in-out_PROCESSORS │ File │ │ 7 │ 10 │ 1.3 KiB │ 0 │ 0 │ isbsvc-default-js-2, isbsvc-default-js-3*, isbsvc-default-js-4! │
│ KV_xxxx-simple-pipeline-in_SOURCE_OT │ File │ │ 24 │ 10 │ 1.3 KiB │ 0 │ 1 │ isbsvc-default-js-0!, isbsvc-default-js-1!, isbsvc-default-js-4! │
│ KV_xxxx-simple-pipeline-in_SOURCE_PROCESSORS │ File │ │ 10 │ 10 │ 1.3 KiB │ 0 │ 0 │ isbsvc-default-js-0!, isbsvc-default-js-2, isbsvc-default-js-3* │
│ xxxx-simple-pipeline-out-1 │ File │ │ 1 │ 100,000 │ 6.2 GiB │ 0 │ 0 │ isbsvc-default-js-0!, isbsvc-default-js-2!, isbsvc-default-js-4! │
│ xxxx-simple-pipeline-out-0 │ File │ │ 1 │ 100,000 │ 6.2 GiB │ 0 │ 0 │ isbsvc-default-js-0!, isbsvc-default-js-1*, isbsvc-default-js-4 │
╰──────────────────────────────────────────────────────────────────────────────┴─────────┴───────────┴───────────┴──────────┴─────────┴──────┴─────────┴──────────────────────────────────────────────────────────────────╯
In this case I have two streams running with size of 6.2Gib running in filestore mode, but the memory usage on the servers were considerably high. More than having both the streams completely in memory even though we have filestore.
~ # njs server ls
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Server Overview │
├─────────────────────┬─────────┬──────┬─────────┬─────┬───────┬───────┬────────┬─────┬─────────┬───────┬───────┬──────┬────────┬─────┤
│ Name │ Cluster │ Host │ Version │ JS │ Conns │ Subs │ Routes │ GWs │ Mem │ CPU % │ Cores │ Slow │ Uptime │ RTT │
├─────────────────────┼─────────┼──────┼─────────┼─────┼───────┼───────┼────────┼─────┼─────────┼───────┼───────┼──────┼────────┼─────┤
│ isbsvc-default-js-2 │ default │ 0 │ 2.10.18 │ yes │ 13 │ 663 │ 16 │ 0 │ 447 MiB │ 3 │ 32 │ 7 │ 41m35s │ 1ms │
│ isbsvc-default-js-3 │ default │ 0 │ 2.10.18 │ yes │ 9 │ 663 │ 16 │ 0 │ 32 MiB │ 2 │ 32 │ 0 │ 41m33s │ 1ms │
│ isbsvc-default-js-0 │ default │ 0 │ 2.10.18 │ yes │ 0 │ 663 │ 16 │ 0 │ 16 GiB │ 18 │ 32 │ 0 │ 4m20s │ 1ms │
│ isbsvc-default-js-4 │ default │ 0 │ 2.10.18 │ yes │ 0 │ 663 │ 16 │ 0 │ 16 GiB │ 100 │ 32 │ 0 │ 2m49s │ 2ms │
│ isbsvc-default-js-1 │ default │ 0 │ 2.10.18 │ yes │ 12 │ 663 │ 16 │ 0 │ 434 MiB │ 5 │ 32 │ 9 │ 41m31s │ 2ms │
├─────────────────────┼─────────┼──
────┼─────────┼─────┼───────┼───────┼────────┼─────┼─────────┼───────┼───────┼──────┼────────┼─────┤
│ │ 1 │ 5 │ │ 5 │ 34 │ 3,315 │ │ │ 33 GiB │ │ │ 16 │ │ │
╰─────────────────────┴─────────┴──────┴─────────┴─────┴───────┴───────┴────────┴─────┴─────────┴───────┴───────┴──────┴────────┴─────╯
Here are the memory profiles for the same mem_prof.zip
@derekcollison @neilalexander
This looks like a build-up of Raft append entries. How are you publishing messages? Are you using core NATS publishes or JetStream publishes?
@neilalexander We use the Jetstream publish
// PublishMsg publishes a Msg to JetStream.
PublishMsg(m *Msg, opts ...PubOpt) (*PubAck, error)
I see a buildup during the writeMsgRecord as well mem_prof_2.zip @neilalexander @derekcollison
@derekcollison @neilalexander Any pointer for the above?
Max Payload: 64 MiB
What is the actual size of the messages you are publishing to the stream?
This max payload setting is abnormally high (our default is 1MB) and it looks like the allocations in the above profile are due to storing very large messages. Would recommend you bring the max payload back down.
You may be better served by the object store instead which chunks larger blobs down into smaller messages if you need to store large volumes of data.
@neilalexander Even though the max payload size is set at 64Mib The average payload size is 60kb
│ xxxx-simple-pipeline-out-1 │ File │ │ 1 │ 100,000 │ 6.2 GiB │ 0 │ 0 │ isbsvc-default-js-0!, isbsvc-default-js-2!, isbsvc-default-js-4! │
│ xxxx-simple-pipeline-out-0 │ File │ │ 1 │ 100,000 │ 6.2 GiB │ 0 │ 0 │ isbsvc-default-js-0!, isbsvc-default-js-1*, isbsvc-default-js-4 │
You can see that stream has 100000 messages with total size as 6.2Gib so the calculations work out as expected ~60Kb
I could lower the max payload size, but the payload size is fixed coming from a controlled data source.
Any other things that can be tried out?
cc @derekcollison
Observed behavior
Running 2 streams on a jetstream cluster with 3 nodes
The server config are as follows
Stream configs are as follows ->
I am running a test where publish messages with payload of size 5mb to the streams, but I saw that the servers got terminated due to OOM killed
Expected behavior
I am running a test with payload size of 5Mb with 50msgs/sec
There are 2 streams with replication factor of 3
I see this OOM killed error in 2 servers, I want to ask why would I be seeing this error when I'm using filestore as storage mode. Will the streams be pulled in-memory at runtime?
Is this high memory requirement expected? If yes, then what will be a good way to detemine what should be the max resource allocation to the Jetstream servers.
Server and client version
Version: 2.10.18
Host environment
No response
Steps to reproduce
No response