nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.84k stars 1.4k forks source link

R3 file-based interest stream, (some) messages are not being removed #3869

Closed MauriceVanVeen closed 1 year ago

MauriceVanVeen commented 1 year ago

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

Versions of nats-server and affected client libraries used:

OS/Container environment:

Kubernetes (reproduced with kind)

Steps or code to reproduce the issue:

Summary:

It seems to be some kind of de-sync, where some replicas keep the message, but others don't.

To reproduce:

Cluster setup:

kind create cluster
helm upgrade -i nats nats/nats --version 0.19.9 --set cluster.enabled=true --set nats.jetstream.enabled=true --set nats.jetstream.fileStorage.storageClassName=standard

Stream setup & bench (with nats-box/cli):

nats str add interest --subjects=interest --storage=file --replicas=3 --retention=interest --discard=old --max-msgs=-1 --max-msgs-per-subject=-1 --max-bytes=-1 --max-age=-1 --max-msg-size=-1 --dupe-window=2m --no-allow-rollup --no-deny-delete --no-deny-purge

nats bench interest --stream interest --js --pub 1 --sub 1 --push

nats str report:

Obtaining Stream stats

╭────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                               Stream Report                                                │
├──────────┬─────────┬───────────┬───────────┬──────────┬─────────┬──────┬─────────┬─────────────────────────┤
│ Stream   │ Storage │ Placement │ Consumers │ Messages │ Bytes   │ Lost │ Deleted │ Replicas                │
├──────────┼─────────┼───────────┼───────────┼──────────┼─────────┼──────┼─────────┼─────────────────────────┤
│ interest │ File    │           │ 0         │ 53       │ 8.6 KiB │ 0    │ 1       │ nats-0*, nats-1, nats-2 │
╰──────────┴─────────┴───────────┴───────────┴──────────┴─────────┴──────┴─────────┴─────────────────────────╯

nats str info interest:

Information for Stream interest created 2023-02-14 21:46:05

             Subjects: interest
             Replicas: 3
              Storage: File

Options:

            Retention: Interest
     Acknowledgements: true
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false

Limits:

     Maximum Messages: unlimited
  Maximum Per Subject: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Cluster Information:

                 Name: nats
               Leader: nats-0
              Replica: nats-1, current, seen 0.31s ago
              Replica: nats-2, current, seen 0.31s ago

State:

             Messages: 53
                Bytes: 8.6 KiB
             FirstSeq: 99,947 @ 2023-02-14T21:46:47 UTC
              LastSeq: 100,000 @ 2023-02-14T21:46:47 UTC
     Deleted Messages: 1
     Active Consumers: 0
   Number of Subjects: 1

nats con ls interest:

No Consumers defined

Expected result:

Stream is empty, all messages are consumed / no consumers left.

Actual result:

Stream contains some messages, and some deleted.

There doesn't seem to be a consistent pattern, the amount of message that remain in the stream aren't the same when retrying from scratch.

nats-0.zip nats-1.zip nats-2.zip

MauriceVanVeen commented 1 year ago

On our k8s cluster (so not running locally with kind) we also see similar behaviour. With less overall messages, but with way more deleted messages.

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                    Stream Report                                                     │
├────────────────────┬─────────┬───────────┬───────────┬──────────┬─────────┬──────┬─────────┬─────────────────────────┤
│ Stream             │ Storage │ Placement │ Consumers │ Messages │ Bytes   │ Lost │ Deleted │ Replicas                │
├────────────────────┼─────────┼───────────┼───────────┼──────────┼─────────┼──────┼─────────┼─────────────────────────┤
│ interest-stream    │ File    │           │ 3         │ 4        │ 1.7 KiB │ 0    │ 3334851 │ nats-0, nats-1*, nats-2 │
╰────────────────────┴─────────┴───────────┴───────────┴──────────┴─────────┴──────┴─────────┴─────────────────────────╯

file-based R3 interest stream with max-age=1h, ~900mps

those messages still get cleaned up with the max-age this way, but would be better if they would be removed once they have been consumed (as all consumers have consumed every message already)

ripienaar commented 1 year ago

What server version? Latest fixed a related bug.

MauriceVanVeen commented 1 year ago

What server version? Latest fixed a related bug.

(latest) 2.9.14

MauriceVanVeen commented 1 year ago

Is fixed since 2.9.16