nanomq / nanomq

An ultra-lightweight and blazing-fast Messaging broker/bus for IoT edge & SDV
https://nanomq.io
MIT License
1.62k stars 193 forks source link

0.22.8 Memory Leak #1914

Open lizziemac opened 1 month ago

lizziemac commented 1 month ago

Describe the bug I updated my devices to use version 0.22.8, and I'm seeing a big memory leak occasionally. I had not seen a leak like this when on 0.21.5 (my previous version). I'm not entirely sure what's causing it, as I have info level logs, and nothing was printed at the start of the leaks, other than some system logs indicating network instability. Below is an image of just the nanomq binary's memory over time, on two devices.

image

Expected behavior The memory should stay consistent.

Actual Behavior The memory climbs until the device can no longer allocate memory.

To Reproduce I think that this may be reproducible by making the network unstable, while using QOS 2. For example, putting foil over a raspberry pi. I will try to reproduce here as well (these affected devices were remote/cannot be tested).

Environment Details

Client SDK Paho MQTT CPP

Additional context Configuration

# NanoMQ Configuration 0.18.0

# #============================================================
# # NanoMQ Broker
# #============================================================

mqtt {
    property_size = 32
    max_packet_size = 10KB
    max_mqueue_len = 2048
    retry_interval = 10s
    keepalive_multiplier = 1.25

    # Three of below, unsupported now
    max_inflight_window = 2048
    max_awaiting_rel = 10s
    await_rel_timeout = 10s
}

listeners.tcp {
    enable = false
}

listeners.ssl {
    enable = true
    bind = "0.0.0.0:8883"
    keyfile = "<path>"
    certfile = "<path>"
    cacertfile = "<path>"
    verify_peer = true
    fail_if_no_peer_cert = true
}

listeners.ws {
    enable = false
}

http_server {
    enable = false
}

log {
    to = [file, console]
    level = warn
    dir = "/tmp"
    file = "nanomq.log"
    rotation {
        size = 10MB
        count = 5
    }
}

auth {
    allow_anonymous = true # We don't use a username and password
    no_match = deny
    deny_action = disconnect

    cache = {
        max_size = 32
        ttl = 1m
    }

    acl = {include "/etc/nanomq_acl.conf"}
}
JaylinYu commented 1 month ago

There are 2 potential memleak issues were fixed last month, and they are not in 0.22.8. https://github.com/nanomq/nanomq/commit/377c035eaad539f0a3bfe1960ca34650fd6c86d7 & https://github.com/nanomq/nanomq/commit/377c035eaad539f0a3bfe1960ca34650fd6c86d7. But you are not using any bridging feature, so I doubt this is not your case.

Are you using the retain msg or session-keeping feature of MQTT? If you send QoS msg to a cached session, the memory gonna increase for sure.

lizziemac commented 1 month ago

I am using the following parameters in my client. I've been using these options since being on 0.21.5, where I didn't have any memory issues over months of use.

  auto opts_builder =
      mqtt::connect_options_builder()
          .v5()
          .properties({// TODO: update to UINT_MAX when this issue is fixed:
                       // https://github.com/eclipse/paho.mqtt.cpp/issues/498
                       {mqtt::property::SESSION_EXPIRY_INTERVAL, INT_MAX}})
          .clean_start(true)
          .automatic_reconnect(std::chrono::seconds(1),
                               std::chrono::seconds(MAX_RETRY_INTERVAL_SECONDS))
          .keep_alive_interval(std::chrono::seconds(30))
          .will(mqtt::message(LWT_TOPIC,
                              "",
                              MQTT_QOS, false));

Edit: I also verified and we are not retaining messages

JaylinYu commented 1 month ago

After trying different kinds of cases. I still cannot reproduce your issue here. But I will keep an eye on this.

No retain msg, clients are using clean start, really no much things could go wrong. It seems like the only possible change related to your case is "session expiry." I see you didnt enable the session-keeping feature but set "session expiry interval" with a max integar value, not sure what you up to here. Perhaps you can try to disable it.

BTW, 0.22.8 also goes through an endurance test by other clients, and no similar issue happened. I suspect the root cause of your problem. Would be appreciated you could provide more details by setting log level to info or verifying if the problem still exists in 0.22.6 & 0.21.10

lizziemac commented 1 month ago

I see you didnt enable the session-keeping feature but set "session expiry interval" with a max integar value, not sure what you up to here. Perhaps you can try to disable it.

I enabled the session expiry for our EMQX instance in the cloud (and we use the same client config currently for nanomq). I can see about disabling that for the nanomq instance. I'll set the log level to info as well, thanks.