nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.09k stars 1.36k forks source link

Performance degradation #5394

Open nenych opened 1 month ago

nenych commented 1 month ago

Observed behavior

Performance degradation after the slow consumer connection. As you can see below, we are observing about 30% degradation of the incoming messages when the slow consumer connected, and about 50% after the second one.

CleanShot 2024-05-07 at 16 58 44@2x

Expected behavior

Stop sending messages to the slow consumers until their buffers are empty without slowing down the server.

Server and client version

Server: 2.9.20 Python library: nats-py 2.7.2

Host environment

Local: MacOS 14.4.1, arm64, Docker 26.0.0 The same behavior with the amd64 emulator (--platform=linux/amd64 flag).

GKE Container-Optimized OS, amd64, containerd

Steps to reproduce

Prepared required configs and docker-compose file that will start NATS, Prometheus, an exporter, and two consumers: https://github.com/nenych/nats-test.

Steps to run

  1. Clone the repository.
  2. Build the docker image:
    docker build -t test/nats:latest .
  3. Install NATS cli: https://docs.nats.io/using-nats/nats-tools/nats_cli
  4. Run docker-compose (will start NATS, prometheus and 2 consumers):
    docker-compose -f ./docker-compose.yaml up -d
  5. Start NATS benchmark:
    nats bench updates --pub=4 --msgs 1000000000 --size=1000
  6. Wait a little and start the slow consumer:
    docker run --rm -it --network=nats-test_default test/nats:latest python3 slow-consumer.py

Explore metrics

  1. Open prometheus: http://localhost:9091/graph
  2. Insert query:
    sum by (job) (rate(nats_varz_in_msgs[30s]))
ripienaar commented 1 month ago

Server 2.9.20 is now quite a while out of date, let us know how latest 2.10 works for you.

nenych commented 1 month ago

Below you can see the same test with the NATS 2.10.14, with this version we have even worse results:

CleanShot 2024-05-07 at 18 36 04@2x

kam1kaze commented 2 weeks ago

Any updates here? We have the same issue on our cluster. Thanks