nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.87k stars 1.4k forks source link

race condition resulting in read of incomplete account results in stream create failing due to JS not being enabled the first time around #2874

Open matthiashanel opened 2 years ago

matthiashanel commented 2 years ago

This is a benign issue with client side retry as workaround.

When a client (belonging to a JS enabled account) connects to the meta leader of a 3 member cluster. and immediately creates a stream, the followign race condition between jsStreamCreateRequest and processStreamAssignment is possible.

Directly connecting to the meta data leader increases the odds of the race but is not necessary in a cluster of >=3.

Each follower learns about the account via the route code. Because this happens through the route, to avoid a deadlock, the accounts are set up as expired and incomplete. The deadlock avoidance is true for at least the nats-resolver. This situation is the mostlikely one, but anyting that will cause the following will do.

Because of this account marked as expired, the stream create message (received by all members of the cluster) invokes jsStreamCreateRequest and triggers the actual account setup of the account (updateAccountClaimsWithRefresh is being invoked). Due to re-locking of the account in this function (as well as its callers) the replicated stream assignment is processed concurrently. Sometimes, processStreamAssignment then finds a partially set up account where JetStream is not enabled and returns an error.

Below traces for one particular account show:

  1. goroutine 13532 register the account
  2. goroutine 13446 in jsStreamCreateRequest starting updating the account in updateAccountClaimsWithRefresh
  3. goroutine 13594 in processStreamAssignment finding an incomplete account and generating an error in lookupStream. (incomplete as claimJWT is empty)
  4. goroutine 13446 in jsStreamCreateRequest finishing updating the account in updateAccountClaimsWithRefresh

In my local unit test runs this happened rarely. (2 in 100) The test will fail as the stream can not be created, due to jetstream not being enabled in the account. I altered the test to retry immediately, and then it always passes.

Hence this is not a severe issue, but one to fix some time. I imagine a specific workaround could be, if not meta leader, avoid getRequestInfo and the resulting account lookup inside jsStreamCreateRequest. Probably has to be adapted to everywhere where this pattern applies.

Furthermore I noticed that in a few locations the error codes returned are incorrect. Usage of NewJSNotEnabledForAccountError() where NewJSNotEnabledError() would be the correct choice. (see code below)

traces from unit test where this was discovered:

=== RUN   TestJetStreamClusterHaLimitStream
=== RUN   TestJetStreamClusterHaLimitStream/jwt
0xc002ff4b40 trace: 0xc002ff4b40 registerAccountNoLock ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C

0xc002ff4b40 trace: goroutine 13532 [running]:
runtime/debug.Stack()
    /Users/matthiashanel/.gimme/versions/go1.17.darwin.amd64/src/runtime/debug/stack.go:24 +0x65
github.com/nats-io/nats-server/v2/server.(*Server).registerAccountNoLock(0xc002475c00, 0xc002ff4b40)
    /Users/matthiashanel/repos/nats-server/server/server.go:1417 +0x86a
github.com/nats-io/nats-server/v2/server.(*Server).LookupOrRegisterAccount(0xc002475c00, {0xc00167d900, 0x1f})
    /Users/matthiashanel/repos/nats-server/server/server.go:1159 +0x1b1
github.com/nats-io/nats-server/v2/server.(*client).processRemoteSub(0xc0018ec600, {0xc002f0d804, 0x58, 0x1041231}, 0x0)
    /Users/matthiashanel/repos/nats-server/server/route.go:1048 +0x567
github.com/nats-io/nats-server/v2/server.(*client).parse(0xc0018ec600, {0xc002f0d800, 0xa9, 0xc07b76055b6682d8})
    /Users/matthiashanel/repos/nats-server/server/parser.go:653 +0x2a95
github.com/nats-io/nats-server/v2/server.(*client).readLoop(0xc0018ec600, {0x0, 0x0, 0x0})
    /Users/matthiashanel/repos/nats-server/server/client.go:1227 +0xe1f
github.com/nats-io/nats-server/v2/server.(*Server).createRoute.func1()
    /Users/matthiashanel/repos/nats-server/server/route.go:1372 +0x25
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
    /Users/matthiashanel/repos/nats-server/server/server.go:3023 +0x87

0xc002ff4b40 trace: 0xc002ff4b40 updateAccountClaimsWithRefresh ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C

goroutine 13446 [running]:
runtime/debug.Stack()
    /Users/matthiashanel/.gimme/versions/go1.17.darwin.amd64/src/runtime/debug/stack.go:24 +0x65
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh(0xc002475c00, 0xc002ff4b40, 0xc0033ae000, 0x1)
    /Users/matthiashanel/repos/nats-server/server/accounts.go:2924 +0x15c
github.com/nats-io/nats-server/v2/server.(*Server).UpdateAccountClaims(0xc002ff4ba8, 0xc0011d5500, 0x300)
    /Users/matthiashanel/repos/nats-server/server/accounts.go:2899 +0x1e
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountWithClaimJWT(0xc001e9bce0, 0xc002ff4b40, {0xc0011d5500, 0xc001ef5900})
    /Users/matthiashanel/repos/nats-server/server/server.go:1506 +0x272
github.com/nats-io/nats-server/v2/server.(*Server).updateAccount(0xc00167dac0, 0xc002ff4b40)
    /Users/matthiashanel/repos/nats-server/server/server.go:1479 +0x12f
github.com/nats-io/nats-server/v2/server.(*Server).lookupAccount(0xc002475c00, {0xc00167dac0, 0x38})
    /Users/matthiashanel/repos/nats-server/server/server.go:1444 +0x119
github.com/nats-io/nats-server/v2/server.(*Server).LookupAccount(...)
    /Users/matthiashanel/repos/nats-server/server/server.go:1464
github.com/nats-io/nats-server/v2/server.(*Server).getRequestInfo(0x10c229e, 0xc002ba9300, {0xc0016ce000, 0xc002804c80, 0x10c6bc0})
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:802 +0x148
github.com/nats-io/nats-server/v2/server.(*Server).jsStreamCreateRequest(0xc002475c00, 0xc002e96480, 0xc002ba9300, 0xc00173eb40, {0xc001184800, 0x20}, {0xc002e099b0, 0x11}, {0xc0016ce000, 0x478, ...})
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:1126 +0xb4
github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch.func1()
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:685 +0x63
created by github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:684 +0x4fa

0xc002ff4b40 a.jetStreamConfigured() = false incomplete true expired false
a.claimJWT =
a.Issuer = OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG
goroutine 13594 [running]:
runtime/debug.Stack()
    /Users/matthiashanel/.gimme/versions/go1.17.darwin.amd64/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
    /Users/matthiashanel/.gimme/versions/go1.17.darwin.amd64/src/runtime/debug/stack.go:16 +0x19
github.com/nats-io/nats-server/v2/server.NewJSNotEnabledForAccountError({0x0, 0xc00010a008, 0x1addf43})
    /Users/matthiashanel/repos/nats-server/server/jetstream_errors_generated.go:1253 +0x28
github.com/nats-io/nats-server/v2/server.(*Account).lookupStream(0xc002ff4b40, {0xc0025c2a40, 0xa})
    /Users/matthiashanel/repos/nats-server/server/jetstream.go:1327 +0x3d6
github.com/nats-io/nats-server/v2/server.(*jetStream).processClusterCreateStream(0xc000887c30, 0xc002ff4b40, 0xc002748630)
    /Users/matthiashanel/repos/nats-server/server/jetstream_cluster.go:2414 +0x216
github.com/nats-io/nats-server/v2/server.(*jetStream).processStreamAssignment(0xc000887c30, 0xc002748630)
    /Users/matthiashanel/repos/nats-server/server/jetstream_cluster.go:2192 +0x678
github.com/nats-io/nats-server/v2/server.(*jetStream).applyMetaEntries(0xc000887c30, {0xc000b5a148, 0x1, 0x0}, 0x0)
    /Users/matthiashanel/repos/nats-server/server/jetstream_cluster.go:1285 +0x3da
github.com/nats-io/nats-server/v2/server.(*jetStream).monitorCluster(0xc000887c30)
    /Users/matthiashanel/repos/nats-server/server/jetstream_cluster.go:836 +0x8fb
created by github.com/nats-io/nats-server/v2/server.(*Server).startGoRoutine
    /Users/matthiashanel/repos/nats-server/server/server.go:3023 +0x87
=== CONT  TestJetStreamClusterHaLimitStream
    jetstream_cluster_test.go:10453: -------------------- JetStream not enabled for account
goroutine 13446 [running]:
runtime/debug.Stack()
    /Users/matthiashanel/.gimme/versions/go1.17.darwin.amd64/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
    /Users/matthiashanel/.gimme/versions/go1.17.darwin.amd64/src/runtime/debug/stack.go:16 +0x19
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh.func1()
    /Users/matthiashanel/repos/nats-server/server/accounts.go:2929 +0x173
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountClaimsWithRefresh(0xc002475c00, 0xc002ff4b40, 0xc0033ae000, 0x1)
    /Users/matthiashanel/repos/nats-server/server/accounts.go:3388 +0x33af
github.com/nats-io/nats-server/v2/server.(*Server).UpdateAccountClaims(0xc002ff4ba8, 0xc0011d5500, 0x300)
    /Users/matthiashanel/repos/nats-server/server/accounts.go:2899 +0x1e
github.com/nats-io/nats-server/v2/server.(*Server).updateAccountWithClaimJWT(0xc001e9bce0, 0xc002ff4b40, {0xc0011d5500, 0xc001ef5900})
    /Users/matthiashanel/repos/nats-server/server/server.go:1506 +0x272
github.com/nats-io/nats-server/v2/server.(*Server).updateAccount(0xc00167dac0, 0xc002ff4b40)
    /Users/matthiashanel/repos/nats-server/server/server.go:1479 +0x12f
github.com/nats-io/nats-server/v2/server.(*Server).lookupAccount(0xc002475c00, {0xc00167dac0, 0x38})
    /Users/matthiashanel/repos/nats-server/server/server.go:1444 +0x119
github.com/nats-io/nats-server/v2/server.(*Server).LookupAccount(...)
    /Users/matthiashanel/repos/nats-server/server/server.go:1464
github.com/nats-io/nats-server/v2/server.(*Server).getRequestInfo(0x10c229e, 0xc002ba9300, {0xc0016ce000, 0xc002804c80, 0x10c6bc0})
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:802 +0x148
github.com/nats-io/nats-server/v2/server.(*Server).jsStreamCreateRequest(0xc002475c00, 0xc002e96480, 0xc002ba9300, 0xc00173eb40, {0xc001184800, 0x20}, {0xc002e099b0, 0x11}, {0xc0016ce000, 0x478, ...})
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:1126 +0xb4
github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch.func1()
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:685 +0x63
created by github.com/nats-io/nats-server/v2/server.(*jetStream).apiDispatch
    /Users/matthiashanel/repos/nats-server/server/jetstream_api.go:684 +0x4fa
    jetstream_cluster_test.go:10466: {"server":{"name":"S-1","host":"127.0.0.1","id":"NDE4P7IOCOGG4OBI7KIPCMN63XJFY54K5GYYSBCQOO5JBKCZNVB3YFOH","cluster":"JSC","ver":"2.7.3-beta","seq":36,"jetstream":true,"time":"2022-02-16T20:37:41.69494Z"},"data":{"account_name":"ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C","update_time":"2022-02-16T20:37:41.69213Z","expired":false,"complete":true,"jetstream_enabled":true,"leafnode_connections":0,"client_connections":1,"subscriptions":4,"imports":[{"subject":"$SYS.REQ.ACCOUNT.PING.CONNZ","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$SYS.REQ.ACCOUNT.ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C.CONNZ","type":"service","invalid":false,"share":true,"tracking":false},{"subject":"$JS.API.\u003e","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$JS.API.\u003e","type":"service","invalid":false,"share":true,"tracking":false},{"subject":"$SYS.REQ.SERVER.PING.CONNZ","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$SYS.REQ.ACCOUNT.ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C.CONNZ","type":"service","invalid":false,"share":true,"tracking":false}],"jwt":"eyJ0eXAiOiJKV1QiLCJhbGciOiJlZDI1NTE5LW5rZXkifQ.eyJqdGkiOiJRWVJVTlA1M01VVEhMQVRYT0xPNE5KU1NTTDZaR0taVk9NQ1pQRDdONkI1N0M1UTM0RldRIiwiaWF0IjoxNjQ1MDQzODYwLCJpc3MiOiJPQ0FUMzNNVFZVMlZVT0lNR05HVU5YSjY2QUgyUkxTREFGM01VQkNZQVk1UU1JTDY1TlFNNlhRRyIsInN1YiI6IkFEUDQ1UlFFUjZRSkhBRDQzVFBIWTdMMlJMVkgzVUlWWE1NNElTRkhaUEhQUkxWRFRCSktLRzNDIiwibmF0cyI6eyJsaW1pdHMiOnsic3VicyI6LTEsImRhdGEiOi0xLCJwYXlsb2FkIjotMSwiaW1wb3J0cyI6LTEsImV4cG9ydHMiOi0xLCJ3aWxkY2FyZHMiOnRydWUsImNvbm4iOi0xLCJsZWFmIjotMSwiaGFfcmVzb3VyY2VzIjoyLCJtZW1fc3RvcmFnZSI6LTEsImRpc2tfc3RvcmFnZSI6LTEsInN0cmVhbXMiOi0xLCJjb25zdW1lciI6LTF9LCJkZWZhdWx0X3Blcm1pc3Npb25zIjp7InB1YiI6e30sInN1YiI6e319LCJ0eXBlIjoiYWNjb3VudCIsInZlcnNpb24iOjJ9fQ.ZB3hwVWXg84PDrjQfbUuh_M1ghYiPyQRzqlSvhnhm7LP2c6xNbwPhtv_J9mvEACKSzv3PZgPiPt2Or69q9ogDQ","issuer_key":"OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG","decoded_jwt":{"jti":"QYRUNP53MUTHLATXOLO4NJSSSL6ZGKZVOMCZPD7N6B57C5Q34FWQ","iat":1645043860,"iss":"OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG","sub":"ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C","nats":{"limits":{"subs":-1,"data":-1,"payload":-1,"imports":-1,"exports":-1,"wildcards":true,"conn":-1,"leaf":-1,"ha_resources":2,"mem_storage":-1,"disk_storage":-1,"streams":-1,"consumer":-1},"default_permissions":{"pub":{},"sub":{}},"type":"account","version":2}},"sublist_stats":{"num_subscriptions":4,"num_cache":3,"num_inserts":4,"num_removes":0,"num_matches":3,"cache_hit_rate":0,"max_fanout":1,"avg_fanout":0.6666666666666666}}}
    jetstream_cluster_test.go:10466: {"server":{"name":"S-3","host":"127.0.0.1","id":"NAU5ZI2IEQ43LM3BWAO7GZAW52KETV35ZX5IEQAVNDUUMDXHG4A2TD54","cluster":"JSC","ver":"2.7.3-beta","seq":39,"jetstream":true,"time":"2022-02-16T20:37:41.695086Z"},"data":{"account_name":"ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C","update_time":"2022-02-16T20:37:41.693078Z","expired":false,"complete":true,"jetstream_enabled":true,"leafnode_connections":0,"client_connections":0,"subscriptions":4,"imports":[{"subject":"$JS.API.\u003e","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$JS.API.\u003e","type":"service","invalid":false,"share":true,"tracking":false},{"subject":"$SYS.REQ.SERVER.PING.CONNZ","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$SYS.REQ.ACCOUNT.ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C.CONNZ","type":"service","invalid":false,"share":true,"tracking":false},{"subject":"$SYS.REQ.ACCOUNT.PING.CONNZ","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$SYS.REQ.ACCOUNT.ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C.CONNZ","type":"service","invalid":false,"share":true,"tracking":false}],"jwt":"eyJ0eXAiOiJKV1QiLCJhbGciOiJlZDI1NTE5LW5rZXkifQ.eyJqdGkiOiJRWVJVTlA1M01VVEhMQVRYT0xPNE5KU1NTTDZaR0taVk9NQ1pQRDdONkI1N0M1UTM0RldRIiwiaWF0IjoxNjQ1MDQzODYwLCJpc3MiOiJPQ0FUMzNNVFZVMlZVT0lNR05HVU5YSjY2QUgyUkxTREFGM01VQkNZQVk1UU1JTDY1TlFNNlhRRyIsInN1YiI6IkFEUDQ1UlFFUjZRSkhBRDQzVFBIWTdMMlJMVkgzVUlWWE1NNElTRkhaUEhQUkxWRFRCSktLRzNDIiwibmF0cyI6eyJsaW1pdHMiOnsic3VicyI6LTEsImRhdGEiOi0xLCJwYXlsb2FkIjotMSwiaW1wb3J0cyI6LTEsImV4cG9ydHMiOi0xLCJ3aWxkY2FyZHMiOnRydWUsImNvbm4iOi0xLCJsZWFmIjotMSwiaGFfcmVzb3VyY2VzIjoyLCJtZW1fc3RvcmFnZSI6LTEsImRpc2tfc3RvcmFnZSI6LTEsInN0cmVhbXMiOi0xLCJjb25zdW1lciI6LTF9LCJkZWZhdWx0X3Blcm1pc3Npb25zIjp7InB1YiI6e30sInN1YiI6e319LCJ0eXBlIjoiYWNjb3VudCIsInZlcnNpb24iOjJ9fQ.ZB3hwVWXg84PDrjQfbUuh_M1ghYiPyQRzqlSvhnhm7LP2c6xNbwPhtv_J9mvEACKSzv3PZgPiPt2Or69q9ogDQ","issuer_key":"OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG","decoded_jwt":{"jti":"QYRUNP53MUTHLATXOLO4NJSSSL6ZGKZVOMCZPD7N6B57C5Q34FWQ","iat":1645043860,"iss":"OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG","sub":"ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C","nats":{"limits":{"subs":-1,"data":-1,"payload":-1,"imports":-1,"exports":-1,"wildcards":true,"conn":-1,"leaf":-1,"ha_resources":2,"mem_storage":-1,"disk_storage":-1,"streams":-1,"consumer":-1},"default_permissions":{"pub":{},"sub":{}},"type":"account","version":2}},"sublist_stats":{"num_subscriptions":4,"num_cache":1,"num_inserts":6,"num_removes":2,"num_matches":1,"cache_hit_rate":0,"max_fanout":0,"avg_fanout":0}}}
    jetstream_cluster_test.go:10466: {"server":{"name":"S-2","host":"127.0.0.1","id":"NDQ5B5JAQHA3SEVMIPRPXDJVFQXVTBCDP66K2FMCM2FP7CRKX23MIH77","cluster":"JSC","ver":"2.7.3-beta","seq":25,"jetstream":true,"time":"2022-02-16T20:37:41.695088Z"},"data":{"account_name":"ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C","update_time":"2022-02-16T20:37:41.693298Z","expired":false,"complete":true,"jetstream_enabled":true,"leafnode_connections":0,"client_connections":0,"subscriptions":4,"imports":[{"subject":"$SYS.REQ.SERVER.PING.CONNZ","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$SYS.REQ.ACCOUNT.ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C.CONNZ","type":"service","invalid":false,"share":true,"tracking":false},{"subject":"$SYS.REQ.ACCOUNT.PING.CONNZ","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$SYS.REQ.ACCOUNT.ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C.CONNZ","type":"service","invalid":false,"share":true,"tracking":false},{"subject":"$JS.API.\u003e","account":"AC6FGFAJBHM5B7YY36SRZ5MKXJKECGIW2CGZQHD2U63TKH5UZWKQGW3K","to":"$JS.API.\u003e","type":"service","invalid":false,"share":true,"tracking":false}],"jwt":"eyJ0eXAiOiJKV1QiLCJhbGciOiJlZDI1NTE5LW5rZXkifQ.eyJqdGkiOiJRWVJVTlA1M01VVEhMQVRYT0xPNE5KU1NTTDZaR0taVk9NQ1pQRDdONkI1N0M1UTM0RldRIiwiaWF0IjoxNjQ1MDQzODYwLCJpc3MiOiJPQ0FUMzNNVFZVMlZVT0lNR05HVU5YSjY2QUgyUkxTREFGM01VQkNZQVk1UU1JTDY1TlFNNlhRRyIsInN1YiI6IkFEUDQ1UlFFUjZRSkhBRDQzVFBIWTdMMlJMVkgzVUlWWE1NNElTRkhaUEhQUkxWRFRCSktLRzNDIiwibmF0cyI6eyJsaW1pdHMiOnsic3VicyI6LTEsImRhdGEiOi0xLCJwYXlsb2FkIjotMSwiaW1wb3J0cyI6LTEsImV4cG9ydHMiOi0xLCJ3aWxkY2FyZHMiOnRydWUsImNvbm4iOi0xLCJsZWFmIjotMSwiaGFfcmVzb3VyY2VzIjoyLCJtZW1fc3RvcmFnZSI6LTEsImRpc2tfc3RvcmFnZSI6LTEsInN0cmVhbXMiOi0xLCJjb25zdW1lciI6LTF9LCJkZWZhdWx0X3Blcm1pc3Npb25zIjp7InB1YiI6e30sInN1YiI6e319LCJ0eXBlIjoiYWNjb3VudCIsInZlcnNpb24iOjJ9fQ.ZB3hwVWXg84PDrjQfbUuh_M1ghYiPyQRzqlSvhnhm7LP2c6xNbwPhtv_J9mvEACKSzv3PZgPiPt2Or69q9ogDQ","issuer_key":"OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG","decoded_jwt":{"jti":"QYRUNP53MUTHLATXOLO4NJSSSL6ZGKZVOMCZPD7N6B57C5Q34FWQ","iat":1645043860,"iss":"OCAT33MTVU2VUOIMGNGUNXJ66AH2RLSDAF3MUBCYAY5QMIL65NQM6XQG","sub":"ADP45RQER6QJHAD43TPHY7L2RLVH3UIVXMM4ISFHZPHPRLVDTBJKKG3C","nats":{"limits":{"subs":-1,"data":-1,"payload":-1,"imports":-1,"exports":-1,"wildcards":true,"conn":-1,"leaf":-1,"ha_resources":2,"mem_storage":-1,"disk_storage":-1,"streams":-1,"consumer":-1},"default_permissions":{"pub":{},"sub":{}},"type":"account","version":2}},"sublist_stats":{"num_subscriptions":4,"num_cache":0,"num_inserts":9,"num_removes":5,"num_matches":0,"cache_hit_rate":0,"max_fanout":0,"avg_fanout":0}}}
    jetstream_cluster_test.go:10468: --------------------

debugging code and location of wrong error.

diff --git a/server/accounts.go b/server/accounts.go
index 6ed28bca..110fb091 100644
--- a/server/accounts.go
+++ b/server/accounts.go
@@ -25,6 +25,7 @@ import (
    "net/http"
    "net/textproto"
    "reflect"
+   "runtime/debug"
    "sort"
    "strconv"
    "strings"
@@ -89,6 +90,9 @@ type Account struct {
    tags         jwt.TagList
    nameTag      string
    lastLimErr   int64
+
+   traces []string
+   mark   bool
 }

 // Account based limits.
@@ -2917,6 +2921,16 @@ func (s *Server) updateAccountClaimsWithRefresh(a *Account, ac *jwt.AccountClaim
    a.checkExpiration(ac.Claims())

    a.mu.Lock()
+   a.traces = append(a.traces, fmt.Sprintf("%p updateAccountClaimsWithRefresh %s\n%s\n%s\n", a, a.Name, a.claimJWT, string(debug.Stack())))
+   defer func() {
+       a.mu.Lock()
+       a.traces = append(a.traces, fmt.Sprintf("%p updateAccountClaimsWithRefresh finished\n%s\n", a, string(debug.Stack())))
+       if a.mark {
+           debug.PrintStack()
+       }
+       a.mark = false
+       a.mu.Unlock()
+   }()
    // Clone to update, only select certain fields.
    old := &Account{Name: a.Name, exports: a.exports, limits: a.limits, signingKeys: a.signingKeys}

diff --git a/server/jetstream.go b/server/jetstream.go
index d3429450..08af2d4f 100644
--- a/server/jetstream.go
+++ b/server/jetstream.go
@@ -909,7 +909,7 @@ func (s *Server) JetStreamNumAccounts() int {
 func (s *Server) JetStreamReservedResources() (int64, int64, error) {
    js := s.getJetStream()
    if js == nil {
-       return -1, -1, NewJSNotEnabledForAccountError()
+       return -1, -1, NewJSNotEnabledError()
    }
    js.mu.RLock()
    defer js.mu.RUnlock()
@@ -1317,6 +1317,13 @@ func (a *Account) lookupStream(name string) (*stream, error) {
    a.mu.RUnlock()

    if jsa == nil {
+       a.mu.RLock()
+       for _, v := range a.traces {
+           fmt.Printf("%p trace: %s\n", a, v)
+       }
+       a.mark = true
+       a.mu.RUnlock()
+       fmt.Printf("%p a.jetStreamConfigured() = %t incomplete %t expired %t \na.claimJWT = %s\na.Issuer = %+v\n", a, a.jetStreamConfigured(), a.incomplete, a.expired, a.claimJWT, a.Issuer)
        return nil, NewJSNotEnabledForAccountError()
    }
    jsa.mu.Lock()
@@ -1433,7 +1440,7 @@ func (a *Account) removeJetStream() error {

    js := s.getJetStream()
    if js == nil {
-       return NewJSNotEnabledForAccountError()
+       return NewJSNotEnabledError()
    }

    return js.disableJetStream(js.lookupAccount(a))
@@ -1884,7 +1891,9 @@ func (a *Account) checkForJetStream() (*Server, *jsAccount, error) {
    jsa := a.js
    a.mu.RUnlock()

-   if s == nil || jsa == nil {
+   if s == nil {
+       return nil, nil, NewJSNotEnabledError()
+   } else if jsa == nil {
        return nil, nil, NewJSNotEnabledForAccountError()
    }

diff --git a/server/jetstream_cluster.go b/server/jetstream_cluster.go
index 84333203..3eb4a841 100644
--- a/server/jetstream_cluster.go
+++ b/server/jetstream_cluster.go
@@ -272,7 +272,7 @@ func (s *Server) JetStreamStepdownStream(account, stream string) error {
 func (s *Server) JetStreamSnapshotStream(account, stream string) error {
    js, cc := s.getJetStreamCluster()
    if js == nil {
-       return NewJSNotEnabledForAccountError()
+       return NewJSNotEnabledError()
    }
    if cc == nil {
        return NewJSClusterNotActiveError()
@@ -464,7 +464,7 @@ func (s *Server) enableJetStreamClustering() error {
    }
    js := s.getJetStream()
    if js == nil {
-       return NewJSNotEnabledForAccountError()
+       return NewJSNotEnabledError()
    }
    // Already set.
    if js.cluster != nil {
diff --git a/server/jetstream_errors_generated.go b/server/jetstream_errors_generated.go
index c91b2022..c392b4a4 100644
--- a/server/jetstream_errors_generated.go
+++ b/server/jetstream_errors_generated.go
@@ -2,7 +2,10 @@

 package server

-import "strings"
+import (
+   "runtime/debug"
+   "strings"
+)

 const (
    // JSAccountHALimitExceededErr JetStream Account High Availability Limit Exceeded
@@ -1236,6 +1239,7 @@ func NewJSNotEmptyRequestError(opts ...ErrorOption) *ApiError {
 // NewJSNotEnabledError creates a new JSNotEnabledErr error: "JetStream not enabled"
 func NewJSNotEnabledError(opts ...ErrorOption) *ApiError {
    eopts := parseOpts(opts)
+   debug.PrintStack()
    if ae, ok := eopts.err.(*ApiError); ok {
        return ae
    }
@@ -1246,6 +1250,7 @@ func NewJSNotEnabledError(opts ...ErrorOption) *ApiError {
 // NewJSNotEnabledForAccountError creates a new JSNotEnabledForAccountErr error: "JetStream not enabled for account"
 func NewJSNotEnabledForAccountError(opts ...ErrorOption) *ApiError {
    eopts := parseOpts(opts)
+   debug.PrintStack()
    if ae, ok := eopts.err.(*ApiError); ok {
        return ae
    }
diff --git a/server/server.go b/server/server.go
index 11e4b6d0..16ccd7e3 100644
--- a/server/server.go
+++ b/server/server.go
@@ -28,6 +28,7 @@ import (
    "net"
    "net/http"
    "regexp"
+   "runtime/debug"

    // Allow dynamic profiling.
    _ "net/http/pprof"
@@ -1412,6 +1413,8 @@ func (s *Server) registerAccountNoLock(acc *Account) *Account {
        }
    }

+   acc.traces = append(acc.traces, fmt.Sprintf("%p registerAccountNoLock %s\n%s\n", acc, acc.Name, acc.claimJWT))
+   acc.traces = append(acc.traces, string(debug.Stack()))
    s.accounts.Store(acc.Name, acc)
    s.tmpAccounts.Delete(acc.Name)
    s.enableAccountTracking(acc)
bruth commented 7 months ago

@neilalexander Unclear if this is still relevant and no urgency, but if you could take a look when you get a chance so we can fix or close it out.