nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.51k stars 1.38k forks source link

nats 2.10.0 take 20 seconds to reload configuration in some scenarios when using MQTT #4630

Open tvojacek opened 11 months ago

tvojacek commented 11 months ago

What version were you using?

nats 2.10.0,nats 2.10.1

What environment was the server running in?

docker nats:2.10.1-alpine3.18 or nats:2.10.0-alpine3.18

Is this defect reproducible?

start client connecting to mqtt publishing x messages every second start server: docker compose up -d create jetstream that capture message reload server (nats-server --signal reload or '$SYS.REQ.SERVER..RELOAD'

nats  | [1] 2023/10/06 06:34:01.535454 [INF] Reloaded: authorization nkey users
nats  | [1] 2023/10/06 06:34:01.535520 [INF] Reloaded: authorization users
nats  | [1] 2023/10/06 06:34:01.535531 [INF] Reloaded: accounts
nats  | [1] 2023/10/06 06:34:01.535540 [INF] Reloaded: cluster
nats  | [1] 2023/10/06 06:34:01.535552 [INF] Reloaded: LeafNode compression settings
nats  | [1] 2023/10/06 06:34:01.535560 [INF] Reloaded: MQTT ack_wait = 0s
nats  | [1] 2023/10/06 06:34:01.535581 [INF] Reloaded: MQTT max_ack_pending = 0
nats  | [1] 2023/10/06 06:34:01.535591 [INF] Reloaded: MQTT stream_replicas = 0
nats  | [1] 2023/10/06 06:34:01.535600 [INF] Reloaded: MQTT consumer_replicas = 0
nats  | [1] 2023/10/06 06:34:01.535608 [INF] Reloaded: MQTT consumer_memory_storage = false
nats  | [1] 2023/10/06 06:34:01.535617 [INF] Reloaded: MQTT consumer_inactive_threshold = 0s
nats  | [1] 2023/10/06 06:34:21.542108 [INF] Reloaded server configuration

Given the capability you are leveraging, describe your expectation?

on 2.9.21 or if is not started, reload happen instantly.

nats  | [1] 2023/10/06 06:34:01.535617 [INF] Reloaded: MQTT consumer_inactive_threshold = 0s
nats  | [1] 2023/10/06 06:34:01.542108 [INF] Reloaded server configuration 

Given the expectation, what is the defect you are observing?

There is 20second delay between Reloaded: MQTT consumer_inactive_threshold = 0s and Reloaded server configuration. During that period auth keys are not loaded. If client connect to get data from jetstream is get error:

nats | [1] 2023/10/06 11:02:27.495304 [ERR] 172.19.0.9:56102 - cid:34 - Publish Violation - Nkey "Uxxxxxx", Subject "$JS.MOCK00-1111-1111-0001.API.CONSUMER.MSG.NEXT.STX_DATA.FORWARDER"

After 20 seconds after Reloaded server configuration message problem with Nkey is resolved.

wallyqs commented 11 months ago

Thanks for the report, did you have active mqtt traffic while doing this or see any other WRN logs?

levb commented 11 months ago

Also, do you (mostly? exclusively?) use clean or "persistent" sessions in MQTT?

tvojacek commented 11 months ago

Problem lay in jetstream data of mqtt, these jetstreams has been migrated from 2.9.21 If I delete these data problem disappear. After recovery of buggy stream data problem repeat . streams.zip

derekcollison commented 8 months ago

Any updates on this one?