nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.27k stars 1.37k forks source link

Memory not freed when traffic stops #4397

Closed pablitovicente closed 11 months ago

pablitovicente commented 11 months ago

Defect

I am testing NATS Jetstream and Key/Value Store and I am seeing that memory consumption does not go down after all activity against the NATS server stops. Not even waiting up till half an hour changes.

Versions of nats-server and affected client libraries used:

OS/Container environment:

Steps or code to reproduce the issue:

Expected result:

Actual result:

derekcollison commented 11 months ago

The NATS server is written in Go, and hence needs to have information on the memory limits beyond container limits. This is handled in the new HELM chart AFAIK.

If that is not set for instance, and the machine has lets say 64G, and the container is limited to say 4G. The NATS server actually sees 64G and that is what it uses to determine when to run the GC etc.

So setting the env variable could kick in the GC which will return memory to the kernel (depending on your kernel version).

https://weaviate.io/blog/gomemlimit-a-game-changer-for-high-memory-applications

pablitovicente commented 11 months ago

Thanks @derekcollison I will look into that.

A pprof capture in case it is useful.

go tool pprof "http://localhost:50001/debug/pprof/heap"
Fetching profile over HTTP from http://localhost:50001/debug/pprof/heap
Saved profile in /home/ronin/pprof/pprof.nats-server.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
File: nats-server
Type: inuse_space
Time: Aug 15, 2023 at 7:16pm (CEST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 449.52MB, 99.56% of 451.52MB total
Dropped 22 nodes (cum <= 2.26MB)
Showing top 10 nodes out of 30
      flat  flat%   sum%        cum   cum%
  265.90MB 58.89% 58.89%   265.90MB 58.89%  github.com/nats-io/nats-server/v2/server.subjFromBytes
  171.50MB 37.98% 96.87%   444.02MB 98.34%  github.com/nats-io/nats-server/v2/server.(*fileStore).populateGlobalPerSubjectInfo
    5.51MB  1.22% 98.09%     5.51MB  1.22%  runtime.allocm
    3.61MB   0.8% 98.89%   272.51MB 60.35%  github.com/nats-io/nats-server/v2/server.(*msgBlock).readPerSubjectInfo
    3.01MB  0.67% 99.56%     3.01MB  0.67%  os.ReadFile
         0     0% 99.56%   444.02MB 98.34%  github.com/nats-io/nats-server/v2/server.(*Account).EnableJetStream
         0     0% 99.56%   444.02MB 98.34%  github.com/nats-io/nats-server/v2/server.(*Account).addStream
         0     0% 99.56%   444.02MB 98.34%  github.com/nats-io/nats-server/v2/server.(*Account).addStreamWithAssignment
         0     0% 99.56%   444.02MB 98.34%  github.com/nats-io/nats-server/v2/server.(*Server).EnableJetStream
         0     0% 99.56%   444.02MB 98.34%  github.com/nats-io/nats-server/v2/server.(*Server).Start

Which lead me to suspect subjFromBytes was related to subjects and in fact the stream seems to have a subject per key?

nats stream info KV_status  
Information for Stream KV_status created 2023-08-15 12:59:40

             Subjects: $KV.status.>
             Replicas: 1
              Storage: File

Options:

            Retention: Limits
     Acknowledgements: true
       Discard Policy: New
     Duplicate Window: 2m0s
           Direct Get: true
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: true

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: 5
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Cluster Information:

                 Name: IIOT
               Leader: NATS-1

State:

             Messages: 5,090,414
                Bytes: 860 MiB
             FirstSeq: 1 @ 2023-08-15T10:59:40 UTC
              LastSeq: 5,090,414 @ 2023-08-15T11:05:42 UTC
     Active Consumers: 0
   Number of Subjects: 5,090,414
derekcollison commented 11 months ago

So you have 5M+ keys.. We use memory to store those and do not drop those from memory in 2.9.x but are looking at getting better at dropping some of this for idle assets in 2.10.

pablitovicente commented 11 months ago

Yeah I am kind of stress testing but I would expect to have a couple of million keys for my use case. I am considering using the K/V store for doing status tracking of objects that are state machines that report the state transitions and I need to keep some history for them.

NATs k/v store seems like a very good choice for my use case and the flexibility of either native NATS connection or Websockets for transport matches my use case very well.

pablitovicente commented 11 months ago

So I followed the linked article by setting GOMEMLIMIT in my docker-compose and that seems to keep memory at bay and performance seems to be quite similar and it is reasonable to expect some higher latency from GC.

I guess we can close this one then as it is expected behavior.

Thank you very much for your help.

derekcollison commented 11 months ago

We should have some improvements on 2.10 as well.