Closed aldiesel closed 10 months ago
Did this happen when going from an earlier version to 2.10.2, or were you running 2.10.2 already before the restart which caused this problem?
Is it still a problem with 2.10.3?
This looks like the server may have gotten OOM'd vs a server panic.
How big is the stream? Is it a KV?
How much memory do you limit to in docker? Do you set the soft mem env variable?
Mix of streams and KV streams. The main stream in use was 1 GB file limit and had 100 MB in the stream. We did notice that this leaf node was on the same account and the other stream was pumping 20k msg/s into its stream/domain.
GOMEMLIMIT is set yes. Was limited to 12 GB mem. though the MEMLIMIT may have been 3 GB. Good point.
We had to delete the whole file and restart the server fresh.
How much memory should we be dedicating at a minimum? Some rough rules would be great. Is it linearly proportional to stream size or there is some minimum that must be allocated? If a 1 GB stream can crash from needing 3 GB memory. What would we need for 1 TB for example?
Thank you for the quick response
Also I just double checked I believe our mem limit in this case was 6.4 GB (80% of 8GB)
I apologize, looking at the stack again it is a panic and that one I recognize and is fixed. Try 2.10.3.
For memory usage, we are constantly looking to improve the memory footprint of JetStream enabled servers. Currently we have a block architecture and a caching mechanism for them such that they are dropped when no longer active. This works well for streams where connumers usually traverse in a fifo order. For random access like KVs you can have more blocks loaded and hence we make those blocks smaller.
We are looking into ways to improve this such that the system could have a fixed allocation strategy, or a high bar for memory usage that we will try not to exceed.
What version were you using?
NATS 2.10.2
What environment was the server running in?
Kubernetes using NATS docker container
Is this defect reproducible?
Unsure how to reproduce.
Given the capability you are leveraging, describe your expectation?
No panic.
Given the expectation, what is the defect you are observing?