nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.49k stars 1.38k forks source link

Insufficient resources when creating JetStream on NFS with max-bytes and replica=3 set #2800

Closed dpotapov closed 2 years ago

dpotapov commented 2 years ago

Defect

Versions of nats-server and affected client libraries used:

OS/Container environment:

or

Steps or code to reproduce the issue:

Start NATS cluster with jetstream storage dir pointing to NFS mount:

nats-server -D -p 4222 --cluster_name test-cluster --name "nats1" -cluster "nats://localhost:4248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/mnt/nfs/nats1"
nats-server -D -p 5222 --cluster_name test-cluster --name "nats2" -cluster "nats://localhost:5248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/mnt/nfs/nats2"
nats-server -D -p 6222 --cluster_name test-cluster --name "nats3" -cluster "nats://localhost:6248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/mnt/nfs/nats3"

Add stream with max-bytes and replica=3 set:

# nats str add
? Stream Name DATA
? Subjects to consume data
? Storage backend file
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Message size limit 10241024
? Maximum message age limit -1
? Maximum individual message size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
? Replicas 3
nats: error: could not create Stream: insufficient resources (10023)

Expected result:

The stream gets created just like when running on local filesystem.

Actual result:

nats: error: could not create Stream: insufficient resources (10023)
derekcollison commented 2 years ago

Can you show us what one of the server's prints out when you start it? Since you are not specifying any limits it will dynamically try to figure them out.

dpotapov commented 2 years ago
[9689] 2022/01/20 16:15:27.220885 [INF] Starting nats-server
[9689] 2022/01/20 16:15:27.220939 [INF]   Version:  2.7.0
[9689] 2022/01/20 16:15:27.220942 [INF]   Git:      [not set]
[9689] 2022/01/20 16:15:27.220945 [INF]   Name:     nats1
[9689] 2022/01/20 16:15:27.220948 [INF]   Node:     RztkeQup
[9689] 2022/01/20 16:15:27.220951 [INF]   ID:       NABXCTETKMD63Z4KUQBQPTGAML7F5CZ2GYWJ3I5WGKLCM6YTN424MLSU
[9689] 2022/01/20 16:15:27.221156 [INF] Starting JetStream
[9689] 2022/01/20 16:15:27.223436 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[9689] 2022/01/20 16:15:27.223447 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[9689] 2022/01/20 16:15:27.223451 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[9689] 2022/01/20 16:15:27.223454 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[9689] 2022/01/20 16:15:27.223456 [INF]
[9689] 2022/01/20 16:15:27.223458 [INF]          https://docs.nats.io/jetstream
[9689] 2022/01/20 16:15:27.223459 [INF]
[9689] 2022/01/20 16:15:27.223462 [INF] ---------------- JETSTREAM ----------------
[9689] 2022/01/20 16:15:27.223470 [INF]   Max Memory:      5.73 GB
[9689] 2022/01/20 16:15:27.223473 [INF]   Max Storage:     70.26 GB
[9689] 2022/01/20 16:15:27.223476 [INF]   Store Directory: "/mnt/nfs/nats1/jetstream"
[9689] 2022/01/20 16:15:27.223478 [INF] -------------------------------------------
derekcollison commented 2 years ago

Size looks fine, does the system show it elects a meta leader?

dpotapov commented 2 years ago

Yes, these lines appear on different instances:

[24396] 2022/01/20 16:28:49.357123 [INF] Self is new JetStream cluster metadata leader

[24491] 2022/01/20 16:28:49.357975 [INF] JetStream cluster new metadata leader: nats1/test-cluster

[24545] 2022/01/20 16:28:49.358006 [INF] JetStream cluster new metadata leader: nats1/test-cluster
derekcollison commented 2 years ago

ok something else must be off, will take a look.

variadico commented 2 years ago

Just as an update, I'm still looking into this.

variadico commented 2 years ago

@dpotapov, what happens when you run these commands (NATS 2.7.0) on your local machine (without NFS)? (I changed the store dir to /tmp)

nats-server -D -p 4222 --cluster_name test-cluster --name "nats1" -cluster "nats://localhost:4248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/tmp/nats/nats1"
nats-server -D -p 5222 --cluster_name test-cluster --name "nats2" -cluster "nats://localhost:5248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/tmp/nats/nats2"
nats-server -D -p 6222 --cluster_name test-cluster --name "nats3" -cluster "nats://localhost:6248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/tmp/nats/nats3"

And then create the stream just as you did before.

$ nats str add
? Stream Name DATA
? Subjects to consume data
? Storage backend file
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Message size limit 10241024
? Maximum message age limit -1
? Maximum individual message size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
? Replicas 3
dpotapov commented 2 years ago

It works:

# nats str add --replicas=3 --max-bytes=10241024 DATA
? Subjects to consume data
? Storage backend file
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Maximum message age limit -1
? Maximum individual message size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
Stream DATA was created

Information for Stream DATA created 2022-01-27T17:12:14Z

Configuration:

             Subjects: data
     Acknowledgements: true
            Retention: File - Limits
             Replicas: 3
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false
     Maximum Messages: unlimited
        Maximum Bytes: 9.8 MiB
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Cluster Information:

                 Name: test-cluster
               Leader: nats2
              Replica: nats3, current, seen 0.00s ago
              Replica: nats1, current, seen 0.00s ago

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0
     Active Consumers: 0
variadico commented 2 years ago

For the commands you ran on your local machine, can you share the versions?

nats-server --version
nats --version
dpotapov commented 2 years ago
# nats-server --version
nats-server: v2.7.0
# nats --version
0.0.28
variadico commented 2 years ago

Hm. OK, well this is interesting. Here's what I'm seeing on my laptop. No AWS Fargate. No NFS. I can only create a stream with 3 replicas if I don't set max-bytes.

$ uname -rs                                                                                                                                                 
Linux 5.16.2-arch1-1

$ rm -rf /tmp/nats/

$ nats-server --version                                                                                                                                     
nats-server: v2.7.0

$ nats-server -D -p 4222 --cluster_name test-cluster --name "nats1" -cluster "nats://localhost:4248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/tmp/nats/nats1"
[7268] 2022/01/27 11:04:50.745642 [INF] Starting nats-server
[7268] 2022/01/27 11:04:50.745752 [INF]   Version:  2.7.0
[7268] 2022/01/27 11:04:50.745761 [INF]   Git:      [not set]
[7268] 2022/01/27 11:04:50.745840 [DBG]   Go build: go1.17.6
[7268] 2022/01/27 11:04:50.745847 [INF]   Name:     nats1
[7268] 2022/01/27 11:04:50.745860 [INF]   Node:     RztkeQup
[7268] 2022/01/27 11:04:50.745865 [INF]   ID:       NA5P5QJL5GASQPRD6HIOINCFS7CIQ45YY76ZUH3BTUB7QSLNGRSLSRIN
[7268] 2022/01/27 11:04:50.745931 [DBG] Created system account: "$SYS"
[7268] 2022/01/27 11:04:50.746595 [INF] Starting JetStream
[7268] 2022/01/27 11:04:50.746688 [DBG] JetStream creating dynamic configuration - 5.59 GB memory, 2.79 GB disk
[7268] 2022/01/27 11:04:50.746792 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7268] 2022/01/27 11:04:50.746799 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7268] 2022/01/27 11:04:50.746805 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7268] 2022/01/27 11:04:50.746808 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7268] 2022/01/27 11:04:50.746814 [INF] 
[7268] 2022/01/27 11:04:50.746818 [INF]          https://docs.nats.io/jetstream
[7268] 2022/01/27 11:04:50.746824 [INF] 
[7268] 2022/01/27 11:04:50.746829 [INF] ---------------- JETSTREAM ----------------
[7268] 2022/01/27 11:04:50.746840 [INF]   Max Memory:      5.59 GB
[7268] 2022/01/27 11:04:50.746847 [INF]   Max Storage:     2.79 GB
[7268] 2022/01/27 11:04:50.746853 [INF]   Store Directory: "/tmp/nats/nats1/jetstream"
[7268] 2022/01/27 11:04:50.746859 [INF] -------------------------------------------
[7268] 2022/01/27 11:04:50.746983 [DBG]   Exports:
[7268] 2022/01/27 11:04:50.746992 [DBG]      $JS.API.>
[7268] 2022/01/27 11:04:50.747032 [DBG] Enabled JetStream for account "$G"
[7268] 2022/01/27 11:04:50.747043 [DBG]   Max Memory:      -1 B
[7268] 2022/01/27 11:04:50.747050 [DBG]   Max Storage:     -1 B
[7268] 2022/01/27 11:04:50.747122 [DBG] JetStream state for account "$G" recovered
[7268] 2022/01/27 11:04:50.747141 [INF] Starting JetStream cluster
[7268] 2022/01/27 11:04:50.747146 [DBG] JetStream cluster checking for stable cluster name and peers
[7268] 2022/01/27 11:04:50.747150 [INF] Creating JetStream metadata controller
[7268] 2022/01/27 11:04:50.747633 [INF] JetStream cluster bootstrapping
[7268] 2022/01/27 11:04:50.747651 [DBG] JetStream cluster initial peers: [RztkeQup]
[7268] 2022/01/27 11:04:50.747657 [DBG] Determining expected peer size for JetStream metacontroller
[7268] 2022/01/27 11:04:50.747665 [DBG] Adjusting expected peer set size to 3 with 1 known

$ nats-server -D -p 5222 --cluster_name test-cluster --name "nats2" -cluster "nats://localhost:5248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/tmp/nats/nats2"
[7278] 2022/01/27 11:04:51.373576 [INF] Starting nats-server
[7278] 2022/01/27 11:04:51.373678 [INF]   Version:  2.7.0
[7278] 2022/01/27 11:04:51.373686 [INF]   Git:      [not set]
[7278] 2022/01/27 11:04:51.373692 [DBG]   Go build: go1.17.6
[7278] 2022/01/27 11:04:51.373699 [INF]   Name:     nats2
[7278] 2022/01/27 11:04:51.373707 [INF]   Node:     SRLRpmYS
[7278] 2022/01/27 11:04:51.373713 [INF]   ID:       NA5WLIUMSMGSS7HL2FO6SSUPTKB7OXMMRJMLN2ZKV55PSJPWPMY2ORGT
[7278] 2022/01/27 11:04:51.373783 [DBG] Created system account: "$SYS"
[7278] 2022/01/27 11:04:51.374418 [INF] Starting JetStream
[7278] 2022/01/27 11:04:51.374512 [DBG] JetStream creating dynamic configuration - 5.59 GB memory, 2.79 GB disk
[7278] 2022/01/27 11:04:51.374675 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7278] 2022/01/27 11:04:51.374689 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7278] 2022/01/27 11:04:51.374697 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7278] 2022/01/27 11:04:51.374704 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7278] 2022/01/27 11:04:51.374710 [INF] 
[7278] 2022/01/27 11:04:51.374716 [INF]          https://docs.nats.io/jetstream
[7278] 2022/01/27 11:04:51.374724 [INF] 
[7278] 2022/01/27 11:04:51.374730 [INF] ---------------- JETSTREAM ----------------
[7278] 2022/01/27 11:04:51.374740 [INF]   Max Memory:      5.59 GB
[7278] 2022/01/27 11:04:51.374751 [INF]   Max Storage:     2.79 GB
[7278] 2022/01/27 11:04:51.374760 [INF]   Store Directory: "/tmp/nats/nats2/jetstream"
[7278] 2022/01/27 11:04:51.374767 [INF] -------------------------------------------
[7278] 2022/01/27 11:04:51.374921 [DBG]   Exports:
[7278] 2022/01/27 11:04:51.374933 [DBG]      $JS.API.>
[7278] 2022/01/27 11:04:51.374996 [DBG] Enabled JetStream for account "$G"
[7278] 2022/01/27 11:04:51.375011 [DBG]   Max Memory:      -1 B
[7278] 2022/01/27 11:04:51.375019 [DBG]   Max Storage:     -1 B
[7278] 2022/01/27 11:04:51.375142 [DBG] JetStream state for account "$G" recovered
[7278] 2022/01/27 11:04:51.375165 [INF] Starting JetStream cluster
[7278] 2022/01/27 11:04:51.375172 [DBG] JetStream cluster checking for stable cluster name and peers
[7278] 2022/01/27 11:04:51.375181 [INF] Creating JetStream metadata controller
[7278] 2022/01/27 11:04:51.375788 [INF] JetStream cluster bootstrapping
[7278] 2022/01/27 11:04:51.375808 [DBG] JetStream cluster initial peers: [SRLRpmYS]
[7278] 2022/01/27 11:04:51.375814 [DBG] Determining expected peer size for JetStream metacontroller
[7278] 2022/01/27 11:04:51.375820 [DBG] Adjusting expected peer set size to 3 with 1 known

$ nats-server -D -p 6222 --cluster_name test-cluster --name "nats3" -cluster "nats://localhost:6248" -routes "nats://localhost:4248,nats://localhost:5248,nats://localhost:6248" -js -sd "/tmp/nats/nats3"
[7289] 2022/01/27 11:04:52.393884 [INF] Starting nats-server
[7289] 2022/01/27 11:04:52.393969 [INF]   Version:  2.7.0
[7289] 2022/01/27 11:04:52.393976 [INF]   Git:      [not set]
[7289] 2022/01/27 11:04:52.393982 [DBG]   Go build: go1.17.6
[7289] 2022/01/27 11:04:52.393987 [INF]   Name:     nats3
[7289] 2022/01/27 11:04:52.393997 [INF]   Node:     fvTBnQC7
[7289] 2022/01/27 11:04:52.394002 [INF]   ID:       ND3TBETX7VAZPAVLXC5MAF7YKAFXPDLDVLMRLY7W4K4WUGNUUUQKOFL4
[7289] 2022/01/27 11:04:52.394061 [DBG] Created system account: "$SYS"
[7289] 2022/01/27 11:04:52.394502 [INF] Starting JetStream
[7289] 2022/01/27 11:04:52.394559 [DBG] JetStream creating dynamic configuration - 5.59 GB memory, 2.79 GB disk
[7289] 2022/01/27 11:04:52.394651 [INF]     _ ___ _____ ___ _____ ___ ___   _   __  __
[7289] 2022/01/27 11:04:52.394657 [INF]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[7289] 2022/01/27 11:04:52.394661 [INF] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[7289] 2022/01/27 11:04:52.394663 [INF]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[7289] 2022/01/27 11:04:52.394666 [INF] 
[7289] 2022/01/27 11:04:52.394668 [INF]          https://docs.nats.io/jetstream
[7289] 2022/01/27 11:04:52.394672 [INF] 
[7289] 2022/01/27 11:04:52.394677 [INF] ---------------- JETSTREAM ----------------
[7289] 2022/01/27 11:04:52.394685 [INF]   Max Memory:      5.59 GB
[7289] 2022/01/27 11:04:52.394691 [INF]   Max Storage:     2.79 GB
[7289] 2022/01/27 11:04:52.394697 [INF]   Store Directory: "/tmp/nats/nats3/jetstream"
[7289] 2022/01/27 11:04:52.394701 [INF] -------------------------------------------
[7289] 2022/01/27 11:04:52.394778 [DBG]   Exports:
[7289] 2022/01/27 11:04:52.394785 [DBG]      $JS.API.>
[7289] 2022/01/27 11:04:52.394834 [DBG] Enabled JetStream for account "$G"
[7289] 2022/01/27 11:04:52.394842 [DBG]   Max Memory:      -1 B
[7289] 2022/01/27 11:04:52.394848 [DBG]   Max Storage:     -1 B
[7289] 2022/01/27 11:04:52.394905 [DBG] JetStream state for account "$G" recovered
[7289] 2022/01/27 11:04:52.394918 [INF] Starting JetStream cluster
[7289] 2022/01/27 11:04:52.394921 [DBG] JetStream cluster checking for stable cluster name and peers
[7289] 2022/01/27 11:04:52.394926 [INF] Creating JetStream metadata controller
[7289] 2022/01/27 11:04:52.395376 [INF] JetStream cluster bootstrapping
[7289] 2022/01/27 11:04:52.395389 [DBG] JetStream cluster initial peers: [fvTBnQC7]
[7289] 2022/01/27 11:04:52.395393 [DBG] Determining expected peer size for JetStream metacontroller
[7289] 2022/01/27 11:04:52.395397 [DBG] Adjusting expected peer set size to 3 with 1 known

$ nats --version                                                                                                                                            
0.0.28

$ nats str add --replicas=3 --max-bytes=10241024 DATA --subjects data
? Storage backend file
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Maximum message age limit -1
? Maximum individual message size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
nats: error: could not create Stream: insufficient resources (10023)

# Also fails with a lower max-bytes.
$ nats str add --replicas=3 --max-bytes=1024 DATA --subjects data
? Storage backend file
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Maximum message age limit -1
? Maximum individual message size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
nats: error: could not create Stream: insufficient resources (10023)

# However, not setting max-bytes does work.
$ nats str add --replicas=3 DATA --subjects data
? Storage backend file
? Retention Policy Limits
? Discard Policy Old
? Stream Messages Limit -1
? Per Subject Messages Limit -1
? Message size limit -1
? Maximum message age limit -1
? Maximum individual message size -1
? Duplicate tracking time window 2m0s
? Allow message Roll-ups No
? Allow message deletion Yes
? Allow purging subjects or the entire stream Yes
Stream DATA was created

Information for Stream DATA created 2022-01-27T11:14:59-08:00

Configuration:

             Subjects: data
     Acknowledgements: true
            Retention: File - Limits
             Replicas: 3
       Discard Policy: Old
     Duplicate Window: 2m0s
    Allows Msg Delete: true
         Allows Purge: true
       Allows Rollups: false
     Maximum Messages: unlimited
        Maximum Bytes: unlimited
          Maximum Age: unlimited
 Maximum Message Size: unlimited
    Maximum Consumers: unlimited

Cluster Information:

                 Name: test-cluster
               Leader: nats1
              Replica: nats2, current, seen 0.00s ago
              Replica: nats3, current, seen 0.00s ago

State:

             Messages: 0
                Bytes: 0 B
             FirstSeq: 0
              LastSeq: 0
     Active Consumers: 0
variadico commented 2 years ago

If it's possible, maybe you can clone and build the branch I'm working on that might fix this issue. Perhaps you can try it on your local machine as well as AWS Fargate + NFS.

git clone https://github.com/nats-io/nats-server.git
cd nats-server
git checkout fix-nodeinfo
go build
dpotapov commented 2 years ago

Sure, I'll try. Have you pushed this branch?

# git checkout fix-nodeinfo
error: pathspec 'fix-nodeinfo' did not match any file(s) known to git.
variadico commented 2 years ago

Oh, oops. Sorry, actually the change has made it to the main branch.

git checkout main
git pull origin main

This is the commit we want: https://github.com/nats-io/nats-server/commit/0d158728d1399ce9abbba9493dfae186a64c1803

dpotapov commented 2 years ago

Yup, it seems the issue is fixed! Thanks!

variadico commented 2 years ago

That's good to hear! https://github.com/nats-io/nats-server/pull/2824 has already been merged into main. This change will probably be part of 2.7.2.