nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.97k stars 1.41k forks source link

nats 2.10.0 take 20 seconds to reload configuration in some scenarios when using MQTT #4630

Open tvojacek opened 1 year ago

tvojacek commented 1 year ago

What version were you using?

nats 2.10.0,nats 2.10.1

What environment was the server running in?

docker nats:2.10.1-alpine3.18 or nats:2.10.0-alpine3.18

Is this defect reproducible?

start client connecting to mqtt publishing x messages every second start server: docker compose up -d create jetstream that capture message reload server (nats-server --signal reload or '$SYS.REQ.SERVER..RELOAD'

nats  | [1] 2023/10/06 06:34:01.535454 [INF] Reloaded: authorization nkey users
nats  | [1] 2023/10/06 06:34:01.535520 [INF] Reloaded: authorization users
nats  | [1] 2023/10/06 06:34:01.535531 [INF] Reloaded: accounts
nats  | [1] 2023/10/06 06:34:01.535540 [INF] Reloaded: cluster
nats  | [1] 2023/10/06 06:34:01.535552 [INF] Reloaded: LeafNode compression settings
nats  | [1] 2023/10/06 06:34:01.535560 [INF] Reloaded: MQTT ack_wait = 0s
nats  | [1] 2023/10/06 06:34:01.535581 [INF] Reloaded: MQTT max_ack_pending = 0
nats  | [1] 2023/10/06 06:34:01.535591 [INF] Reloaded: MQTT stream_replicas = 0
nats  | [1] 2023/10/06 06:34:01.535600 [INF] Reloaded: MQTT consumer_replicas = 0
nats  | [1] 2023/10/06 06:34:01.535608 [INF] Reloaded: MQTT consumer_memory_storage = false
nats  | [1] 2023/10/06 06:34:01.535617 [INF] Reloaded: MQTT consumer_inactive_threshold = 0s
nats  | [1] 2023/10/06 06:34:21.542108 [INF] Reloaded server configuration

Given the capability you are leveraging, describe your expectation?

on 2.9.21 or if is not started, reload happen instantly.

nats  | [1] 2023/10/06 06:34:01.535617 [INF] Reloaded: MQTT consumer_inactive_threshold = 0s
nats  | [1] 2023/10/06 06:34:01.542108 [INF] Reloaded server configuration 

Given the expectation, what is the defect you are observing?

There is 20second delay between Reloaded: MQTT consumer_inactive_threshold = 0s and Reloaded server configuration. During that period auth keys are not loaded. If client connect to get data from jetstream is get error:

nats | [1] 2023/10/06 11:02:27.495304 [ERR] 172.19.0.9:56102 - cid:34 - Publish Violation - Nkey "Uxxxxxx", Subject "$JS.MOCK00-1111-1111-0001.API.CONSUMER.MSG.NEXT.STX_DATA.FORWARDER"

After 20 seconds after Reloaded server configuration message problem with Nkey is resolved.

wallyqs commented 1 year ago

Thanks for the report, did you have active mqtt traffic while doing this or see any other WRN logs?

levb commented 1 year ago

Also, do you (mostly? exclusively?) use clean or "persistent" sessions in MQTT?

tvojacek commented 1 year ago

Problem lay in jetstream data of mqtt, these jetstreams has been migrated from 2.9.21 If I delete these data problem disappear. After recovery of buggy stream data problem repeat . streams.zip

derekcollison commented 10 months ago

Any updates on this one?