nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
16k stars 1.41k forks source link

MQTT "protocol violation" #6068

Open dakhnod opened 2 weeks ago

dakhnod commented 2 weeks ago

Observed behavior

I have a NATS instance configured with a "store_dir" config set to some folder. Whenever I attempted connecting with MQTT, the connection got cancelled and the NATS log displayed

Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.555576 [DBG] 192.168.36.209:42498 - mid:49 - Client connection created
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.556071 [INF] Creating MQTT streams/consumers with replicas 1 for account "$G"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.560145 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.STREAM.INFO.$MQTT_sess"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.561550 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.STREAM.INFO.$MQTT_msgs"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.562230 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.STREAM.INFO.$MQTT_qos2in"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.563005 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.STREAM.INFO.$MQTT_out"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.563831 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.STREAM.INFO.$MQTT_rmsgs"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.564686 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.CONSUMER.DELETE.$MQTT_rmsgs.$MQTT_rmsgs_23ClGv19"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.567904 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.CONSUMER.CREATE.$MQTT_rmsgs"
Nov 01 14:06:56 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:06:56.569148 [DBG] 116.203.164.58:7422 - lid:48 - Not permitted to deliver to "$JS.API.STREAM.MSG.GET.$MQTT_sess"
Nov 01 14:07:00 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:07:00.570786 [WRN] 192.168.36.209:42498 - mid:49 - "mqtt-explorer-6d66a523" - Readloop processing time: 4.014958493s
Nov 01 14:07:00 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:07:00.570885 [ERR] 192.168.36.209:42498 - mid:49 - "mqtt-explorer-6d66a523" - unable to connect: unable to persist session "mqtt-explorer-6d66a523" (seq=0): timeout after 4.001027562s: request type "SP" on "$MQTT.sess.ctc-primary-leaf-17123.pV1vpQ1Q" (reply="$MQTT.JSA.23ClGv19.SP.pV1vpQ1Q.kShGiWDKWz1>
Nov 01 14:07:00 ctc-primary nats-server[50636]: [50636] 2024/11/01 14:07:00.570940 [DBG] 192.168.36.209:42498 - mid:49 - "mqtt-explorer-6d66a523" - Client connection closed: Protocol Violation

I assume that the issue was NATS not being able to write to some subfolder of its "store_dir". Deleting "$store_dir/jetstream" solved the issue. Chowning the whole store_dir might have helped aswell.

The error "Client connection closed: Protocol Violation" seems a bit misleading, If if diagnosed the problem right.

Expected behavior

An error message like "store path ... not writable" would have helped find the problem faster.

Server and client version

nats-server: v2.10.21 mqtt-client: any

Host environment

No response

Steps to reproduce

No response

levb commented 2 weeks ago

@dakhnod Thanks for reporting. I will look into improving the error wrapping/logging, and also (not to self) to check the readloop timeout, it should not be happening.