waku-org / waku-rust-bindings

Rust wrapper over go-waku ffi
14 stars 6 forks source link

Memory leak when discv5 is enabled #86

Closed petkodes closed 2 months ago

petkodes commented 5 months ago

At graphops we've been experiencing a memory leak issue for some time with our Subgraph Radio project, which uses the bindings under the hood. I traced the memory leak back to the bindings and wrote up this simple example to show it in action:

use std::{error::Error, str::FromStr};

use waku::{
    waku_default_pubsub_topic, waku_new, ContentFilter, Multiaddr, ProtocolId, Running, WakuNodeConfig, WakuNodeHandle
};

pub const WAKU_DISCOVERY_ENR: &str = "enr:-P-4QJI8tS1WTdIQxq_yIrD05oIIW1Xg-tm_qfP0CHfJGnp9dfr6ttQJmHwTNxGEl4Le8Q7YHcmi-kXTtphxFysS11oBgmlkgnY0gmlwhLymh5GKbXVsdGlhZGRyc7hgAC02KG5vZGUtMDEuZG8tYW1zMy53YWt1djIucHJvZC5zdGF0dXNpbS5uZXQGdl8ALzYobm9kZS0wMS5kby1hbXMzLndha3V2Mi5wcm9kLnN0YXR1c2ltLm5ldAYfQN4DiXNlY3AyNTZrMaEDbl1X_zJIw3EAJGtmHMVn4Z2xhpSoUaP5ElsHKCv7hlWDdGNwgnZfg3VkcIIjKIV3YWt1Mg8";

const NODES: &[&str] = &[
    "/dns4/node-01.ac-cn-hongkong-c.wakuv2.test.statusim.net/tcp/30303/p2p/16Uiu2HAkvWiyFsgRhuJEb9JfjYxEkoHLgnUQmr1N5mKWnYjxYRVm",
    "/dns4/node-01.do-ams3.wakuv2.test.statusim.net/tcp/30303/p2p/16Uiu2HAmPLe7Mzm8TsYUubgCAW1aJoeFScxrLj8ppHFivPo97bUZ",
    "/dns4/node-01.gc-us-central1-a.wakuv2.test.statusim.net/tcp/30303/p2p/16Uiu2HAmJb2e28qLXxT5kZxVUUoJt72EMzNGXB47Rxx5hw3q4YjS"
];

fn setup_node_handle() -> std::result::Result<WakuNodeHandle<Running>, Box<dyn Error>> {
    let config = WakuNodeConfig {
        host: None,
        port: None,
        advertise_addr: None,
        node_key: None,
        keep_alive_interval: None,
        relay: None,
        store: None,
        database_url: None, 
        store_retention_max_messages: None,
        store_retention_max_seconds: None,
        relay_topics: vec![],
        min_peers_to_publish:  Some(0),
        filter: None,
        log_level: None,
        discv5: Some(true),
        discv5_bootstrap_nodes: vec![WAKU_DISCOVERY_ENR.to_string()],
        discv5_udp_port: None,
        gossipsub_params: None,
        dns4_domain_name: None,
        websocket_params: None,
    };

    let node_handle = waku_new(Some(config))?;
    let node_handle = node_handle.start()?;

    for address in NODES.iter().map(|a| Multiaddr::from_str(a).unwrap()) {
        let peerid = node_handle.add_peer(&address, ProtocolId::Relay)?;
        node_handle.connect_peer_with_id(&peerid, None)?;
    }

    let content_filter = ContentFilter::new(Some(waku_default_pubsub_topic()), vec![]);
    node_handle.relay_subscribe(&content_filter)?;
    Ok(node_handle)
}

fn main() {
    let _ = setup_node_handle();

    loop {

    }
}

In this example the memory leak is gradual, but still persistent, for instance when I ran the above program (on MacOS, but the leak in our app is even more apparent on Ubuntu because for some reason the memory spikes are bigger) the RAM usage starts off at around 53mb and over the course of 24 hours is gradually increases to 112mb. I know this doesn't sound like much but in Subgraph Radio this issue is exacerbated so much that the memory starts off at around 250mb and over the course of 3 days it's already above 1GB, the dashboard shows persistent small increases that never plateau.

What's interesting is that when I switch discv5 to false (and the example app uses just the static nodes instead) the RAM doesn't constantly increase, sometimes it goes down, although I haven't done extensive monitoring for that scenario, so I'm not sure if only discv5/dns is the problem.

petkodes commented 5 months ago

Here's the repo for the test example with a few commits showing different behaviors https://github.com/axiomatic-aardvark/bindings-testing