nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.49k stars 1.38k forks source link

expected sequence does not match store when using Nats-Msg-Id header #2242

Closed krisdaniels closed 3 years ago

krisdaniels commented 3 years ago

When using a msg id to prevent duplicates, an error is always returned when sending a non duplicate after having received a duplicate ack result :

docker run -ti --rm --network host natsio/nats-box

nats str add ORDERS --subjects "ORDERS.*" --ack --max-msgs=-1 --max-bytes=-1 --max-age=1y --storage file --retention limits --max-msg-size=-1 --discard old --dupe-window="0s" --replicas 2
nats con add ORDERS NEW --filter ORDERS.new --ack explicit --pull --deliver all --sample 100 --max-deliver 20 --replay instant --max-pending 20

nats req -H Nats-Msg-Id:1 ORDERS.new hello1 
nats req -H Nats-Msg-Id:1 ORDERS.new hello2 # duplicate
nats req -H Nats-Msg-Id:2 ORDERS.new hello3 # new msg always fails after duplicate
nats req -H Nats-Msg-Id:2 ORDERS.new hello3 # retry 1 fails
nats req -H Nats-Msg-Id:2 ORDERS.new hello3 # retry 2 succeeds

First message + duplicate :

# nats req -H Nats-Msg-Id:1 ORDERS.new hello1 

05:16:03 Sending request on "ORDERS.new"
05:16:03 Received on "_INBOX.r0yfSLDniVtFWbJXzrdkyZ.OGFSlLFI" rtt 1.003804ms
{"stream":"ORDERS","seq":1}

# nats req -H Nats-Msg-Id:1 ORDERS.new hello2 # duplicate

05:16:03 Sending request on "ORDERS.new"
05:16:03 Received on "_INBOX.Gu8EQT3FXwoZTn2pXW8oRb.MPEuuDBY" rtt 973.104µs
{"stream":"ORDERS","seq":1,"duplicate": true}

First attempt of new message after duplicate

# nats req -H Nats-Msg-Id:2 ORDERS.new hello3 # new msg always fails after duplicate

05:16:03 Sending request on "ORDERS.new"
05:16:03 Received on "_INBOX.9MXKkqfTzruzga2f9zztXo.nRG0SrMJ" rtt 1.058504ms
{"error":{"code":503,"description":"expected sequence does not match store"},"stream":"ORDERS","seq":0}

Second attempt of new message after duplicate

# nats req -H Nats-Msg-Id:2 ORDERS.new hello3 # retry 1 fails

05:16:03 Sending request on "ORDERS.new"
05:16:03 Received on "_INBOX.oNSsmTtzsFQ2kQKyypIpAp.HStDnu1s" rtt 1.014404ms
{"error":{"code":503,"description":"expected stream sequence does not match"},"stream":"ORDERS","seq":0}

Third attempt of new message after duplicate

# nats req -H Nats-Msg-Id:2 ORDERS.new hello3 # retry 2 succeeds

05:16:04 Sending request on "ORDERS.new"
05:16:04 Received on "_INBOX.9V2tVYbsCj5QAYZUdglPSz.vRWPMBoS" rtt 946.503µs
{"stream":"ORDERS","seq":2}

Configs :

server_name: n1-c1
port: 4221
monitor_port: 8221
jetstream {
   store_dir: /nats/data1
}
cluster {
  name: c1
  port: 6221
  authorization {
    user: ruser
    password: T0pS3cr3t
    timeout: 2
  }
  routes = [
    nats-route://ruser:T0pS3cr3t@localhost:6222
    nats-route://ruser:T0pS3cr3t@localhost:6223
  ]
}

#######################################################
server_name: n2-c1
port: 4222
monitor_port: 8222
jetstream {
   store_dir: /nats/data2
}
cluster {
  name: c1
  port: 6222
  authorization {
    user: ruser
    password: T0pS3cr3t
    timeout: 2
  }
  routes = [
    nats-route://ruser:T0pS3cr3t@localhost:6221
    nats-route://ruser:T0pS3cr3t@localhost:6223
  ]
}

#######################################################
server_name: n3-c1
port: 4223
monitor_port: 8223
jetstream {
   store_dir: /nats/data3
}
cluster {
  name: c1
  port: 6223
  authorization {
    user: ruser
    password: T0pS3cr3t
    timeout: 2
  }
  routes = [
    nats-route://ruser:T0pS3cr3t@localhost:6221
    nats-route://ruser:T0pS3cr3t@localhost:6222
  ]
}

Docker images used and start commands:


docker run --rm --network host --name n1c1 -ti -v /opt/nats/n1:/nats nats:2.2.5-scratch -js -c /nats/nats-server-js.conf
docker run --rm --network host --name n2c1 -ti -v /opt/nats/n2:/nats nats:2.2.5-scratch -js -c /nats/nats-server-js.conf
docker run --rm --network host --name n3c1 -ti -v /opt/nats/n3:/nats nats:2.2.5-scratch -js -c /nats/nats-server-js.conf

Logs :


# nats str add ORDERS --subjects "ORDERS.*" --ack --max-msgs=-1 --max-bytes=-1 --max-age=1y --storage file --retention limits --max-msg-size=-1 --discard old --dupe-window="0s" --replicas 2
Information for Stream ORDERS created 2021-05-24T05:16:02Z

Configuration:

             Subjects: ORDERS.*
     Acknowledgements: true
            Retention: File - Limits
             Replicas: 2
       Discard Policy: Old
     Duplicate Window: 2m0s
     Maximum Messages: unlimited
        Maximum Bytes: unlimited
          Maximum Age: 1y0d0h0m0s
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Cluster Information:

                 Name: c1
               Leader: n2-c1
              Replica: n1-c1, current, seen 0.00s ago

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0
     Active Consumers: 0

# nats con add ORDERS NEW --filter ORDERS.new --ack explicit --pull --deliver all --sample 100 --max-deliver 20 --replay instant --max-pending 20
Information for Consumer ORDERS > NEW created 2021-05-24T05:16:02Z

Configuration:

        Durable Name: NEW
           Pull Mode: true
      Filter Subject: ORDERS.new
         Deliver All: true
          Ack Policy: Explicit
            Ack Wait: 30s
       Replay Policy: Instant
  Maximum Deliveries: 20
       Sampling Rate: 100
     Max Ack Pending: 20

Cluster Information:

                Name: c1
              Leader: n2-c1
             Replica: n1-c1, current, seen 0.00s ago

State:

   Last Delivered Message: Consumer sequence: 0 Stream sequence: 0
     Acknowledgment floor: Consumer sequence: 0 Stream sequence: 0
         Outstanding Acks: 0 out of maximum 20
     Redelivered Messages: 0
     Unprocessed Messages: 0    

# n1c1
[1] 2021/05/24 05:15:45.501791 [INF] Starting nats-server
[1] 2021/05/24 05:15:45.501840 [INF]   Version:  2.2.5
[1] 2021/05/24 05:15:45.501850 [INF]   Git:      [b7e1f66]
[1] 2021/05/24 05:15:45.501852 [INF]   Name:     n1-c1
[1] 2021/05/24 05:15:45.501856 [INF]   Node:     oP5LzZ64
[1] 2021/05/24 05:15:45.501858 [INF]   ID:       NA2LUWZUM4K5LXSTMBVJBVV2BDHOQUM2RZVTOFV2KCXEECCO7NWJRMW7
[1] 2021/05/24 05:15:45.501866 [INF] Using configuration file: /nats/nats-server-js.conf
[1] 2021/05/24 05:15:45.503345 [INF] Starting JetStream
[1] 2021/05/24 05:15:45.503687 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[1] 2021/05/24 05:15:45.503699 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[1] 2021/05/24 05:15:45.503702 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[1] 2021/05/24 05:15:45.503704 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[1] 2021/05/24 05:15:45.503705 [INF] 
[1] 2021/05/24 05:15:45.503707 [INF]          https://docs.nats.io/jetstream
[1] 2021/05/24 05:15:45.503709 [INF] 
[1] 2021/05/24 05:15:45.503711 [INF] ---------------- JETSTREAM ----------------
[1] 2021/05/24 05:15:45.503716 [INF]   Max Memory:      47.11 GB
[1] 2021/05/24 05:15:45.503719 [INF]   Max Storage:     71.63 GB
[1] 2021/05/24 05:15:45.503721 [INF]   Store Directory: "/nats/data1/jetstream"
[1] 2021/05/24 05:15:45.503722 [INF] -------------------------------------------
[1] 2021/05/24 05:15:45.503954 [INF] Starting JetStream cluster
[1] 2021/05/24 05:15:45.503966 [INF] Creating JetStream metadata controller
[1] 2021/05/24 05:15:45.504617 [INF] JetStream cluster bootstrapping
[1] 2021/05/24 05:15:45.507293 [INF] Starting http monitor on 0.0.0.0:8221
[1] 2021/05/24 05:15:45.507364 [INF] Listening for client connections on 0.0.0.0:4221
[1] 2021/05/24 05:15:45.507844 [INF] Server is ready
[1] 2021/05/24 05:15:45.507878 [INF] Cluster name is c1
[1] 2021/05/24 05:15:45.507917 [INF] Listening for route connections on 0.0.0.0:6221
[1] 2021/05/24 05:15:45.508760 [ERR] Error trying to connect to route (attempt 1): dial tcp 127.0.0.1:6222: connect: connection refused
[1] 2021/05/24 05:15:45.508761 [ERR] Error trying to connect to route (attempt 1): dial tcp 127.0.0.1:6223: connect: connection refused
[1] 2021/05/24 05:15:45.834585 [INF] 127.0.0.1:55950 - rid:6 - Route connection created
[1] 2021/05/24 05:15:46.509468 [INF] 127.0.0.1:6222 - rid:7 - Route connection created
[1] 2021/05/24 05:15:46.509703 [INF] 127.0.0.1:6222 - rid:7 - Router connection closed: Duplicate Route
[1] 2021/05/24 05:15:46.763209 [INF] 127.0.0.1:55958 - rid:8 - Route connection created
[1] 2021/05/24 05:15:47.510420 [INF] 127.0.0.1:6223 - rid:9 - Route connection created
[1] 2021/05/24 05:15:47.510764 [INF] 127.0.0.1:6223 - rid:9 - Router connection closed: Duplicate Route
[1] 2021/05/24 05:16:03.262354 [ERR] JetStream failed to store a msg on stream '$G > ORDERS' -  expected sequence does not match store
[1] 2021/05/24 05:16:03.323042 [WRN] Got stream sequence mismatch for '$G > ORDERS'
[1] 2021/05/24 05:16:03.323801 [WRN] Resetting stream '$G > ORDERS'

# n2c1
[1] 2021/05/24 05:15:45.827275 [INF] Starting nats-server
[1] 2021/05/24 05:15:45.827315 [INF]   Version:  2.2.5
[1] 2021/05/24 05:15:45.827317 [INF]   Git:      [b7e1f66]
[1] 2021/05/24 05:15:45.827319 [INF]   Name:     n2-c1
[1] 2021/05/24 05:15:45.827340 [INF]   Node:     YGhnH3VX
[1] 2021/05/24 05:15:45.827343 [INF]   ID:       NBFT2DN4J3K7CHME3GZES5QSHJITYMPVGTUMMDQU7EIRUENXQZCXPWI4
[1] 2021/05/24 05:15:45.827356 [INF] Using configuration file: /nats/nats-server-js.conf
[1] 2021/05/24 05:15:45.828060 [INF] Starting JetStream
[1] 2021/05/24 05:15:45.828292 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[1] 2021/05/24 05:15:45.828306 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[1] 2021/05/24 05:15:45.828308 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[1] 2021/05/24 05:15:45.828309 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[1] 2021/05/24 05:15:45.828311 [INF] 
[1] 2021/05/24 05:15:45.828313 [INF]          https://docs.nats.io/jetstream
[1] 2021/05/24 05:15:45.828314 [INF] 
[1] 2021/05/24 05:15:45.828316 [INF] ---------------- JETSTREAM ----------------
[1] 2021/05/24 05:15:45.828321 [INF]   Max Memory:      47.11 GB
[1] 2021/05/24 05:15:45.828324 [INF]   Max Storage:     71.63 GB
[1] 2021/05/24 05:15:45.828326 [INF]   Store Directory: "/nats/data2/jetstream"
[1] 2021/05/24 05:15:45.828328 [INF] -------------------------------------------
[1] 2021/05/24 05:15:45.828611 [INF] Starting JetStream cluster
[1] 2021/05/24 05:15:45.828625 [INF] Creating JetStream metadata controller
[1] 2021/05/24 05:15:45.829221 [INF] JetStream cluster bootstrapping
[1] 2021/05/24 05:15:45.833060 [INF] Starting http monitor on 0.0.0.0:8222
[1] 2021/05/24 05:15:45.833112 [INF] Listening for client connections on 0.0.0.0:4222
[1] 2021/05/24 05:15:45.833505 [INF] Server is ready
[1] 2021/05/24 05:15:45.833520 [INF] Cluster name is c1
[1] 2021/05/24 05:15:45.833560 [INF] Listening for route connections on 0.0.0.0:6222
[1] 2021/05/24 05:15:45.834256 [ERR] Error trying to connect to route (attempt 1): dial tcp 127.0.0.1:6223: connect: connection refused
[1] 2021/05/24 05:15:45.834573 [INF] 127.0.0.1:6221 - rid:6 - Route connection created
[1] 2021/05/24 05:15:46.509514 [INF] 127.0.0.1:51868 - rid:7 - Route connection created
[1] 2021/05/24 05:15:46.509817 [INF] 127.0.0.1:51868 - rid:7 - Router connection closed: Duplicate Route
[1] 2021/05/24 05:15:46.763164 [INF] 127.0.0.1:51872 - rid:8 - Route connection created
[1] 2021/05/24 05:15:46.834928 [INF] 127.0.0.1:6223 - rid:9 - Route connection created
[1] 2021/05/24 05:15:46.835205 [INF] 127.0.0.1:6223 - rid:9 - Router connection closed: Duplicate Route
[1] 2021/05/24 05:15:49.118613 [INF] JetStream cluster new metadata leader
[1] 2021/05/24 05:16:02.829210 [INF] JetStream cluster new stream leader for '$G > ORDERS'
[1] 2021/05/24 05:16:03.068025 [INF] JetStream cluster new consumer leader for '$G > ORDERS > NEW'
[1] 2021/05/24 05:16:03.262225 [ERR] JetStream failed to store a msg on stream '$G > ORDERS' -  expected sequence does not match store
[1] 2021/05/24 05:16:03.322924 [WRN] Got stream sequence mismatch for '$G > ORDERS'
[1] 2021/05/24 05:16:03.324023 [WRN] Resetting stream '$G > ORDERS'
[1] 2021/05/24 05:16:03.448718 [INF] JetStream cluster new stream leader for '$G > ORDERS'

# n3c1
[1] 2021/05/24 05:15:46.757037 [INF] Starting nats-server
[1] 2021/05/24 05:15:46.757084 [INF]   Version:  2.2.5
[1] 2021/05/24 05:15:46.757087 [INF]   Git:      [b7e1f66]
[1] 2021/05/24 05:15:46.757090 [INF]   Name:     n3-c1
[1] 2021/05/24 05:15:46.757094 [INF]   Node:     hxTjz3J4
[1] 2021/05/24 05:15:46.757096 [INF]   ID:       NAAIP3PK6PMRE5C5ZFKYMO4JHPZZ23VKIWOI7CNWHIDJL23H7ZXML3FD
[1] 2021/05/24 05:15:46.757104 [INF] Using configuration file: /nats/nats-server-js.conf
[1] 2021/05/24 05:15:46.757729 [INF] Starting JetStream
[1] 2021/05/24 05:15:46.757995 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[1] 2021/05/24 05:15:46.758008 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[1] 2021/05/24 05:15:46.758010 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[1] 2021/05/24 05:15:46.758012 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[1] 2021/05/24 05:15:46.758014 [INF] 
[1] 2021/05/24 05:15:46.758016 [INF]          https://docs.nats.io/jetstream
[1] 2021/05/24 05:15:46.758017 [INF] 
[1] 2021/05/24 05:15:46.758019 [INF] ---------------- JETSTREAM ----------------
[1] 2021/05/24 05:15:46.758023 [INF]   Max Memory:      47.11 GB
[1] 2021/05/24 05:15:46.758025 [INF]   Max Storage:     71.63 GB
[1] 2021/05/24 05:15:46.758027 [INF]   Store Directory: "/nats/data3/jetstream"
[1] 2021/05/24 05:15:46.758029 [INF] -------------------------------------------
[1] 2021/05/24 05:15:46.758188 [INF] Starting JetStream cluster
[1] 2021/05/24 05:15:46.758215 [INF] Creating JetStream metadata controller
[1] 2021/05/24 05:15:46.758801 [INF] JetStream cluster bootstrapping
[1] 2021/05/24 05:15:46.761751 [INF] Starting http monitor on 0.0.0.0:8223
[1] 2021/05/24 05:15:46.761801 [INF] Listening for client connections on 0.0.0.0:4223
[1] 2021/05/24 05:15:46.762113 [INF] Server is ready
[1] 2021/05/24 05:15:46.762130 [INF] Cluster name is c1
[1] 2021/05/24 05:15:46.762151 [INF] Listening for route connections on 0.0.0.0:6223
[1] 2021/05/24 05:15:46.763260 [INF] 127.0.0.1:6222 - rid:6 - Route connection created
[1] 2021/05/24 05:15:46.763315 [INF] 127.0.0.1:6221 - rid:7 - Route connection created
[1] 2021/05/24 05:15:46.835032 [INF] 127.0.0.1:39006 - rid:8 - Route connection created
[1] 2021/05/24 05:15:46.835251 [INF] 127.0.0.1:39006 - rid:8 - Router connection closed: Duplicate Route
[1] 2021/05/24 05:15:47.510547 [INF] 127.0.0.1:39008 - rid:9 - Route connection created
[1] 2021/05/24 05:15:47.510781 [INF] 127.0.0.1:39008 - rid:9 - Router connection closed: Duplicate Route
ripienaar commented 3 years ago

I can confirm this is happening, and in a cluster even worse:

May 24 08:57:42 n2-sfo nats-server[1343]: [1] [INF] JetStream cluster new consumer leader for 'one > ORDERS_65 > C3'
May 24 08:57:44 n3-sfo nats-server[1362]: [1] [INF] JetStream cluster new consumer leader for 'one > ORDERS_14 > C1'
May 24 08:57:49 n3-nyc nats-server[1345]: [1] [ERR] JetStream failed to store a msg on stream 'one > ORDERS_1' -  expected sequence does not match store
May 24 08:57:49 n1-nyc nats-server[1346]: [1] [ERR] JetStream failed to store a msg on stream 'one > ORDERS_1' -  expected sequence does not match store
May 24 08:57:49 n2-nyc nats-server[1349]: [1] [ERR] JetStream failed to store a msg on stream 'one > ORDERS_1' -  expected sequence does not match store
May 24 08:57:49 n3-nyc nats-server[1345]: [1] [WRN] Got stream sequence mismatch for 'one > ORDERS_1'
May 24 08:57:49 n1-nyc nats-server[1346]: [1] [WRN] Got stream sequence mismatch for 'one > ORDERS_1'
May 24 08:57:49 n2-nyc nats-server[1349]: [1] [WRN] Got stream sequence mismatch for 'one > ORDERS_1'
May 24 08:57:49 n2-nyc nats-server[1349]: [1] [WRN] Resetting stream 'one > ORDERS_1'
May 24 08:57:49 n3-nyc nats-server[1345]: [1] [WRN] Resetting stream 'one > ORDERS_1'
May 24 08:57:49 n1-nyc nats-server[1346]: [1] [WRN] Resetting stream 'one > ORDERS_1'
May 24 08:57:51 n1-nyc nats-server[1346]: [1] [INF] JetStream cluster new stream leader for 'one > ORDERS_1'

This happens when I publish the first non dupe, definitely something bad giong on here.