nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.49k stars 1.38k forks source link

NATS server start is delayed when looking for deleted JetStream accounts #3176

Closed goku321 closed 1 year ago

goku321 commented 2 years ago

Defect

Make sure that these boxes are checked before submitting your issue -- thank you!

Versions of nats-server and affected client libraries used:

nats-server: 2.8.4 nsc: 2.5.0

OS/Container environment:

macOS Monterey

nats-server config:

# Operator named falcon
operator: eyJ0eXAiOiJKV1QiLCJhbGciOiJlZDI1NTE5LW5rZXkifQ.eyJqdGkiOiJQT1hMT1Q1U0RDUEIyRDVJSk5OSFNaMzNJNVlNREVUNjdYUDY1MjdET1BWMkozQ1dUWEFRIiwiaWF0IjoxNjUwMjY0MTk4LCJpc3MiOiJPQUNJNFYyNEFaNEI3RVVVSFVHVEgyNllWRTJaS0FEWkFGNEdDNFZXMjdTSko1UEtINlM2T0hLTyIsIm5hbWUiOiJmYWxjb24iLCJzdWIiOiJPQUNJNFYyNEFaNEI3RVVVSFVHVEgyNllWRTJaS0FEWkFGNEdDNFZXMjdTSko1UEtINlM2T0hLTyIsIm5hdHMiOnsic3lzdGVtX2FjY291bnQiOiJBQzM2SFlNM0RKQlpPWTdYU1Q1M0FEUVBQQlJBSTVZT1NTWVpUQlIzMk40M0JKWTNONDc0QVUyUSIsInR5cGUiOiJvcGVyYXRvciIsInZlcnNpb24iOjJ9fQ.VIjjjcjcCKIH8wbtHIBXflTdaiRsEY364UKxd3CrqoNzAf4SBQVhCDG1bzO2-K4wYZQ_ey9ZCRUAxnU-WOcsAA
# System Account named SYS
system_account: AC36HYM3DJBZOY7XST53ADQPPBRAI5YOSSYZTBR32N43BJY3N474AU2Q

# configuration of the nats based resolver
resolver {
    type: full
    # Directory in which the account jwt will be stored
    dir: './jwt'
    # In order to support jwt deletion, set to true
    # If the resolver type is full delete will rename the jwt.
    # This is to allow manual restoration in case of inadvertent deletion.
    # To restore a jwt, remove the added suffix .delete and restart or send a reload signal.
    # To free up storage you must manually delete files with the suffix .delete.
    allow_delete: true
    # Interval at which a nats-server with a nats based account resolver will compare
    # it's state with one random nats based account resolver in the cluster and if needed,
    # exchange jwt and converge on the same set of jwt.
    interval: "2m"
    # Timeout for lookup requests in case an account does not exist locally.
    timeout: "10s"
}

# Preload the nats based resolver with the system account jwt.
# This is not necessary but avoids a bootstrapping system account.
# This only applies to the system account. Therefore other account jwt are not included here.
# To populate the resolver:
# 1) make sure that your operator has the account server URL pointing at your nats servers.
#    The url must start with: "nats://"
#    nsc edit operator --account-jwt-server-url nats://localhost:4222
# 2) push your accounts using: nsc push --all
#    The argument to push -u is optional if your account server url is set as described.
# 3) to prune accounts use: nsc push --prune
#    In order to enable prune you must set above allow_delete to true
# Later changes to the system account take precedence over the system account jwt listed here.
resolver_preload: {
    AC36HYM3DJBZOY7XST53ADQPPBRAI5YOSSYZTBR32N43BJY3N474AU2Q: eyJ0eXAiOiJKV1QiLCJhbGciOiJlZDI1NTE5LW5rZXkifQ.eyJqdGkiOiJPTDRYN0hETllRVkdDVzYyTjRVWTRQU1EzN0tHTEhWQlhVM0ZIWkM0N1FNWk5XWlZXVU1BIiwiaWF0IjoxNjUwMjY0MTk4LCJpc3MiOiJPQUNJNFYyNEFaNEI3RVVVSFVHVEgyNllWRTJaS0FEWkFGNEdDNFZXMjdTSko1UEtINlM2T0hLTyIsIm5hbWUiOiJTWVMiLCJzdWIiOiJBQzM2SFlNM0RKQlpPWTdYU1Q1M0FEUVBQQlJBSTVZT1NTWVpUQlIzMk40M0JKWTNONDc0QVUyUSIsIm5hdHMiOnsiZXhwb3J0cyI6W3sibmFtZSI6ImFjY291bnQtbW9uaXRvcmluZy1zdHJlYW1zIiwic3ViamVjdCI6IiRTWVMuQUNDT1VOVC4qLlx1MDAzZSIsInR5cGUiOiJzdHJlYW0iLCJhY2NvdW50X3Rva2VuX3Bvc2l0aW9uIjozLCJkZXNjcmlwdGlvbiI6IkFjY291bnQgc3BlY2lmaWMgbW9uaXRvcmluZyBzdHJlYW0iLCJpbmZvX3VybCI6Imh0dHBzOi8vZG9jcy5uYXRzLmlvL25hdHMtc2VydmVyL2NvbmZpZ3VyYXRpb24vc3lzX2FjY291bnRzIn0seyJuYW1lIjoiYWNjb3VudC1tb25pdG9yaW5nLXNlcnZpY2VzIiwic3ViamVjdCI6IiRTWVMuUkVRLkFDQ09VTlQuKi4qIiwidHlwZSI6InNlcnZpY2UiLCJyZXNwb25zZV90eXBlIjoiU3RyZWFtIiwiYWNjb3VudF90b2tlbl9wb3NpdGlvbiI6NCwiZGVzY3JpcHRpb24iOiJSZXF1ZXN0IGFjY291bnQgc3BlY2lmaWMgbW9uaXRvcmluZyBzZXJ2aWNlcyBmb3I6IFNVQlNaLCBDT05OWiwgTEVBRlosIEpTWiBhbmQgSU5GTyIsImluZm9fdXJsIjoiaHR0cHM6Ly9kb2NzLm5hdHMuaW8vbmF0cy1zZXJ2ZXIvY29uZmlndXJhdGlvbi9zeXNfYWNjb3VudHMifV0sImxpbWl0cyI6eyJzdWJzIjotMSwiZGF0YSI6LTEsInBheWxvYWQiOi0xLCJpbXBvcnRzIjotMSwiZXhwb3J0cyI6LTEsIndpbGRjYXJkcyI6dHJ1ZSwiY29ubiI6LTEsImxlYWYiOi0xfSwic2lnbmluZ19rZXlzIjpbIkFDVVM3UlJLRUFLNTQ0Tk43QkpFNlFLSUFERTY2TDZMNzZWSlVEWjZSRUFSWU1DMkxPRFFWTFdOIl0sImRlZmF1bHRfcGVybWlzc2lvbnMiOnsicHViIjp7fSwic3ViIjp7fX0sInR5cGUiOiJhY2NvdW50IiwidmVyc2lvbiI6Mn19.h0w0bqrua7RENSVTy3J7p2ZsGKL76sHMM3vDrvqSb03pSXVX_X9gXx3QGCn8o54VU8_3HHuj9jyibd5pHMiPBg,
}

max_payload: 8MB

Steps or code to reproduce the issue:

  1. Create 3 JetStream enabled accounts a, b, and c
  2. Create exports in accounts a and b. Import these exports in account c
  3. Push all account JWTs to the nats-server.
  4. Delete JWTs for accounts a and b from the nats-server
  5. Restart the nats-server

Expected result:

nats-server shouldn't wait for the lookup timeout (10s in the above config) to happen when it is not able to find the deleted account. If there are multiple imports from accounts that do not exists, server takes a long time to start because it waits for the complete timeout duration to elapse. If the account is not found, the select block (check below) should exit without waiting on the timeout.

https://github.com/nats-io/nats-server/blob/0794eafa6f0eeefd0d71899d29932253381a4440/server/accounts.go#L4037

Actual result:

nats-server waits for 20s (twice for 10s for two imports from accounts that were deleted) before starting to serve.

Screenshot 2022-06-09 at 12 41 24 PM
derekcollison commented 1 year ago

This should be fixed.