nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.91k stars 1.41k forks source link

Nats JetStream consumers performance #4799

Closed sskserk closed 11 months ago

sskserk commented 11 months ago

Observed behavior

Hi Nats Guys,

we are facing extremely low performance of JetStream consumers.

The following fragment code processes approximately 100 messages per second. Then it hangs for a couple of seconds.

_, err := jStreamContext.QueueSubscribe(
            "subject",
            "queue",
            func(msg *nats.Msg) {
                m, _ := msg.Metadata()
                log.Info(m.Sequence.Stream)

            },
            nats.DeliverLastPerSubject(),
        )

Corresposind jStreamContext is created as:

jsm, _ := nc.JetStream()
jStreamContext, err = jsm.AddStream(&nats.StreamConfig{
            Name:     "streamName",
            Subjects: []string{"subject"},
            NoAck:    true,
        })

We have an increased amount of JetStream messages which we can't process on time.

Expected behavior

Messages at least pushed by server with a higher rate

Server and client version

Nats 2.3.2

Host environment

Linux/Ubuntu

Steps to reproduce

No response

derekcollison commented 11 months ago

Is the server version really 2.3.2? The latest server release is 2.10.5.

sskserk commented 11 months ago

nats-server version

nats-server: v2.9.22

Client lib in go.mod: require ( github.com/nats-io/nats.go v1.31.0 )

derekcollison commented 11 months ago

ok latest on 2.9.x is 2.9.24.

Can we get more information?

Why no ack on the stream? Do you see memory growth in the server? How fast are messages entering the stream? What rate for msgs and bytes? How many different subjects?

Can you share output of nats stream info for the above stream? What does nats consumer report show for that stream as well?

sskserk commented 11 months ago

Setting the "NoAck" to True is just an attempt to fight the issue. Have a theory that ACKs just require much time. In reality, the "NoAck" doesn't provide observable boost in performance.

sskserk commented 11 months ago

Nats stream info


              Subjects: A2S_NOTIFICATIONS.*
              Replicas: 1
               Storage: File

Options:

             Retention: Limits
       Acknowledgments: false
        Discard Policy: Old
      Duplicate Window: 2m0s
     Allows Msg Delete: true
          Allows Purge: true
        Allows Rollups: false

Limits:

      Maximum Messages: unlimited
   Maximum Per Subject: unlimited
         Maximum Bytes: unlimited
           Maximum Age: 1h0m0s
  Maximum Message Size: unlimited
     Maximum Consumers: unlimited

Cluster Information:

                  Name: SOME_CLUSTER
                Leader: devops-multi8nodes-nats-px-02

State:

              Messages: 29,408
                 Bytes: 130 MiB
        First Sequence: 1 @ 2023-11-16 03:21:23 UTC
         Last Sequence: 29,408 @ 2023-11-16 03:46:05 UTC
      Active Consumers: 1
    Number of Subjects: 1
sskserk commented 11 months ago

Nats consumer info:

? Select a Stream A2S_NOTIFICATIONS
? Select a Consumer queue-metric
Information for Consumer A2S_NOTIFICATIONS > queue-metric created 2023-11-16T03:35:46+03:00

Configuration:

                    Name: queue-metric
        Delivery Subject: _INBOX.H1xRi0QIWDD7xYZZgmK0BU
          Filter Subject: A2S_NOTIFICATIONS.*
          Deliver Policy: Last Per Subject
     Deliver Queue Group: queue-metric
              Ack Policy: Explicit
                Ack Wait: 30.00s
           Replay Policy: Instant
         Max Ack Pending: 1,000
            Flow Control: false

Cluster Information:

                    Name: tantor
                  Leader: devops-multi8nodes-nats-px-02

State:

  Last Delivered Message: Consumer sequence: 12,824 Stream sequence: 30,267 Last delivery: -473.001ms ago
    Acknowledgment Floor: Consumer sequence: 12,824 Stream sequence: 30,267 Last Ack: -474.442ms ago
        Outstanding Acks: 0 out of maximum 1,000
    Redelivered Messages: 0
    Unprocessed Messages: 0
         Active Interest: Active using Queue Group queue-metric
sskserk commented 11 months ago
derekcollison commented 11 months ago

The no-ack for publishing into a stream is not recommended unless there is a specific reason to do so. Normally it allows the server to receive messages at a higher rate then can be stored in the stream. We track this internally and reserve the right to drop these messages if the server is under pressure.

For your consumer, does a normal non-queue group consumer with normal delivery work? Meaning eliminate the queue group and the deliver last by subject?

Also, possible to upgrade to 2.10.5?

sskserk commented 11 months ago

Upgraded:

# nats-server version
nats-server: v2.10.5

The following code works, but the message rate is still very low.

        qSubscription, err := jStreamContext.Subscribe(
            subjectName,
            func(msg *nats.Msg) {
                m, _ := msg.Metadata()
                log.Info(m.Sequence.Stream)

                msg.Ack()
            },
            nats.DeliverLastPerSubject(),
        )
sskserk commented 11 months ago

NATS configuration file:

server_name: devops-multi8nodes-nats-px-01
# Client port of 4222 on all interfaces
port: 4222
monitor_port: 8222
logfile: /opt/logs/nats/nats.log
logfile_size_limit: 1GB
jetstream {
  store_dir=/opt/data/nats
}

cluster {
name: tantor
listen: devops-multi8nodes-nats-px-01:6222 
connect_retries: 10
authorization: {
user: natsAdmin #you can use any name
password: ****
timeout: 0
}
routes = [
    nats://natsAdmin:****@devops-multi8nodes-nats-px-02:6222
    nats://natsAdmin:****@devops-multi8nodes-nats-px-03:6222
]}
tls { #certificates location
cert_file: "/opt/nats/certs/fullchain.pem"
key_file: "/opt/nats/certs/privkey.pem"
timeout: 10.0
}
sskserk commented 11 months ago

NATS bench

# nats bench --stream="A2S_NOTIFICATIONS" "queue-metric.*" --server devops-multi8nodes-nats-px-02 --sub 1 --pub 1
12:43:01 Starting Core NATS pub/sub benchmark [subject=queue-metric.*, multisubject=false, multisubjectmax=100000, msgs=100,000, msgsize=128 B, pubs=1, subs=1, pubsleep=0s, subsleep=0s]
12:43:01 Starting subscriber, expecting 100,000 messages
12:43:01 Starting publisher, publishing 100,000 messages
Finished      0s [============================================================================================================================================================================] 100%
Finished      0s [============================================================================================================================================================================] 100%

NATS Pub/Sub stats: 1,499,928 msgs/sec ~ 183.10 MB/sec
 Pub stats: 1,099,047 msgs/sec ~ 134.16 MB/sec
 Sub stats: 754,115 msgs/sec ~ 92.06 MB/sec

But the sample given here (https://github.com/nats-io/go-nats-examples/blob/main/api-examples/subscribe_async/main.go) barely receives 100 msg/second

sskserk commented 11 months ago

@derekcollison we just noticed that our messages have the 40K payload. Can it be the reason?

What is a recommended approach - many small messages or a few big?

derekcollison commented 11 months ago

Have you tried removing the log.Info line from your sample program to see if that has an affect?

sskserk commented 11 months ago

@derekcollison sorry for the wrong alarm,

finally we found the issue. There is a mixture of factors: 1) out publishers send messages of a big size 2) max ack pending parameter used by queue groups was set to a miserable values 3) slow messages processing logic

After fixing the mentioned issues we do observe the perfect processing pace. NATS JetStreaming is just a amazing!! Thank you for the good product!