nats-io / nats-server

High-Performance server for NATS.io, the cloud and edge native messaging system.
https://nats.io
Apache License 2.0
15.89k stars 1.4k forks source link

Improve MQTT connect error messages #5780

Open mbneimann-at-work opened 2 months ago

mbneimann-at-work commented 2 months ago

Proposed change

Multiple error messages returned when parsing a MQTT connect telegram have little information about the conditions surrounding the error.

Information about telegram "CONNECT" and the Flag and Value the parsing failed on.

Use case

When trying to understand a MQTT connection problem, it was very hard to understand what was going on.

I had a PLC, with MQTT version 3.1.1, that tried to connect, but was refused a connection to a NATS server. Connection to a Mosquitto server worked without problems.

Finally, by looking at the CONECT telegram in Wireshark, we could see that the Will Topic looked more like Will Message and not a valid topic.

    Connect Command
        0001 0000 = Header Flags: 0x10 (Connect Command)
            0001 .... = Message Type: Connect Command (1)
            .... 0... = DUP Flag: Not set
            .... .00. = QOS Level: Fire and Forget (0)
            .... ...0 = Retain: Not set
        Msg Len: 38
        Protocol Name: MQTT
        Version: 4
        0000 0110 = Connect Flags: 0x06
            0... .... = User Name Flag: Not set
            .0.. .... = Password Flag: Not set
            ..0. .... = Will Retain: Not set
            ...0 0... = QOS Level: Fire and Forget (0)
            .... .1.. = Will Flag: Set
            .... ..1. = Clean Session Flag: Set
            .... ...0 = (Reserved): Not set
        Keep Alive: 15
        Client ID: CPU313C
        Will Topic: Connection Lost
        Will Message: 

The server logs did not tell me much about where and what was going on:

[6612] 2024/08/12 14:28:37.931896 [←[32mINF←[0m] Starting nats-server
[6612] 2024/08/12 14:28:37.932898 [←[32mINF←[0m]   Version:  2.10.18
[6612] 2024/08/12 14:28:37.932898 [←[32mINF←[0m]   Git:      [57d23ac]
[6612] 2024/08/12 14:28:37.932898 [←[32mINF←[0m]   Name:     NDI4YPG5HRMEXBFB42JJNQ7Z5LOTFBVU7U5KTVVUBKKSA3R6SIYLQSYK
[6612] 2024/08/12 14:28:37.932898 [←[32mINF←[0m]   Node:     rv5T7kHn
[6612] 2024/08/12 14:28:37.932898 [←[32mINF←[0m]   ID:       NDI4YPG5HRMEXBFB42JJNQ7Z5LOTFBVU7U5KTVVUBKKSA3R6SIYLQSYK
[6612] 2024/08/12 14:28:37.933899 [←[32mINF←[0m] Using configuration file: config\server_with_mqtt.yaml
[6612] 2024/08/12 14:28:37.933899 [←[32mINF←[0m] Starting JetStream
[6612] 2024/08/12 14:28:37.935899 [←[32mINF←[0m]     _ ___ _____ ___ _____ ___ ___   _   __  __
[6612] 2024/08/12 14:28:37.935899 [←[32mINF←[0m]  _ | | __|_   _/ __|_   _| _ \ __| /_\ |  \/  |
[6612] 2024/08/12 14:28:37.935899 [←[32mINF←[0m] | || | _|  | | \__ \ | | |   / _| / _ \| |\/| |
[6612] 2024/08/12 14:28:37.935899 [←[32mINF←[0m]  \__/|___| |_| |___/ |_| |_|_\___/_/ \_\_|  |_|
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m]
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m]          https://docs.nats.io/jetstream
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m]
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m] ---------------- JETSTREAM ----------------
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m]   Max Memory:      5.91 GB
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m]   Max Storage:     1.00 TB
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m]   Store Directory: "C:\Users\ADMINI~1\AppData\Local\Temp\nats\jetstream"
[6612] 2024/08/12 14:28:37.936900 [←[32mINF←[0m] -------------------------------------------
[6612] 2024/08/12 14:28:37.938903 [←[32mINF←[0m]   Starting restore for stream '$G > $MQTT_msgs'
[6612] 2024/08/12 14:28:37.939901 [←[32mINF←[0m]   Restored 0 messages for stream '$G > $MQTT_msgs' in 1ms
[6612] 2024/08/12 14:28:37.940902 [←[32mINF←[0m]   Starting restore for stream '$G > $MQTT_out'
[6612] 2024/08/12 14:28:37.941904 [←[32mINF←[0m]   Restored 0 messages for stream '$G > $MQTT_out' in 1ms
[6612] 2024/08/12 14:28:37.942903 [←[32mINF←[0m]   Starting restore for stream '$G > $MQTT_qos2in'
[6612] 2024/08/12 14:28:37.944756 [←[32mINF←[0m]   Restored 0 messages for stream '$G > $MQTT_qos2in' in 1ms
[6612] 2024/08/12 14:28:37.944756 [←[32mINF←[0m]   Starting restore for stream '$G > $MQTT_rmsgs'
[6612] 2024/08/12 14:28:37.946878 [←[32mINF←[0m]   Restored 0 messages for stream '$G > $MQTT_rmsgs' in 1ms
[6612] 2024/08/12 14:28:37.946878 [←[32mINF←[0m]   Starting restore for stream '$G > $MQTT_sess'
[6612] 2024/08/12 14:28:37.948879 [←[32mINF←[0m]   Restored 2 messages for stream '$G > $MQTT_sess' in 2ms
[6612] 2024/08/12 14:28:37.948879 [←[32mINF←[0m]   Recovering 1 consumers for stream - '$G > $MQTT_rmsgs'
[6612] 2024/08/12 14:28:37.975900 [←[32mINF←[0m] Listening for MQTT clients on mqtt://0.0.0.0:1883
[6612] 2024/08/12 14:28:37.976581 [←[32mINF←[0m] Listening for client connections on 0.0.0.0:4222
[6612] 2024/08/12 14:28:37.981587 [←[32mINF←[0m] Server is ready
[6612] 2024/08/12 14:28:38.227571 [←[32mINF←[0m] Creating MQTT streams/consumers with replicas 1 for account "$G"
[6612] 2024/08/12 14:28:41.485009 [←[31mERR←[0m] 192.168.3.110:1883 - mid:26 - "CPU313C" - character ' ' not supported for MQTT topics
[6612] 2024/08/12 14:28:56.331452 [←[31mERR←[0m] 192.168.3.110:1883 - mid:27 - "CPU313C" - character ' ' not supported for MQTT topics
[6612] 2024/08/12 14:28:58.941577 [←[32mINF←[0m] Initiating Shutdown...
[6612] 2024/08/12 14:28:58.942284 [←[32mINF←[0m] Initiating JetStream Shutdown...
[6612] 2024/08/12 14:28:59.017614 [←[32mINF←[0m] JetStream Shutdown
[6612] 2024/08/12 14:28:59.018613 [←[32mINF←[0m] Server Exiting..

Contribution

No response

KIT-IT commented 2 months ago

Hello! Docker image version: 2.10.18-alpine3.20 Nats JS runs in RAFT, has 3 services in docker. Storage is mounted via volume.

volumes:
- /etc/localtime:/etc/localtime:ro
- {"type": "tmpfs", "target": "/tmp"}
- {"type": "tmpfs", "target": "/run"}
- /var/nats/natsjs_1/:/data/jetstream/

Each service has its own storage. All services run on one node. Streams and consumers have three replicas.

nats str add Stream\
--subjects "stream.in, stream.out" \
--ack --max-msgs=-1 \
--max-bytes=-1 \
--max-age=3d \
--storage file \
--retention work \
--max-msg-size=-1 \
--discard old \
--dupe-window="2h" \
--no-deny-purge \
--deny-delete \
--no-allow-rollup \
--max-msgs-per-subject=-1 \
--replicas 3

The power was recently cut off and after restoration, nats js does not see streams, consumers, messages, but they are present in the storage. Why did this happen and how to get the data back? They are physically present. natsjs_1 An example of a log in the attachment, it is the same on all 3 nodes, there is no hint for recovery or anything like that natsjs_1.log

KIT-IT commented 2 months ago

Jarema ,maybe you can help?