neo4j-labs / neo4rs

Rust driver for Neo4j
https://docs.rs/neo4rs
205 stars 59 forks source link

Bolt threads not properly closing after graph.run or graph.execute #106

Open DataFrogman opened 1 year ago

DataFrogman commented 1 year ago

I have an async program that runs graph.execute and graph.run calls. I am using a semaphore to limit the number of concurrent tasks at once. My neo4j server is set to have 100x the number of concurrent tasks for its thread pool maximum and I am setting my max_connections for the ConfigBuilder at 10x the number of concurrent tasks. When I run a graph.execute or graph.run I make sure to await on them, so each task should only have one live connection at a time.

Despite this I very quickly hit AuthenticationError("There are no available threads to serve this request at the moment. You can retry at a later time or consider increasing max thread pool size for bolt connector(s).") or UnexpectedMessage("unexpected response for RUN: Ok(Failure(Failure { metadata: BoltMap { value: {BoltString { value: \"code\" }: String(BoltString { value: \"Neo.TransientError.Request.NoThreadsAvailable\" }), BoltString { value: \"message\" }: String(BoltString { value: \"There are no available threads to serve this request at the moment. You can retry at a later time or consider increasing max thread pool size for bolt connector(s).\" })} } }))"). The higher my number of concurrent tasks the faster I will hit these errors, this leads me to believe that there is some lag time with closing the connections despite the await statements. At 1024 concurrent tasks it takes about 15 seconds, at 512 concurrent tasks it occurs within a couple of minutes.

DataFrogman commented 1 year ago

The following code should near instantly duplicate the issue.

#[tokio::main]
async fn main() -> Result<()> {
    let uri = "bolt://172.20.26.147:7687";
    let user = "neo4j";
    let password = "neo4j;

    let config = ConfigBuilder::default()
        .uri(uri)
        .user(user)
        .password(password)
        .max_connections(10240)
        .build()
        .unwrap();

    let graph = Arc::new(
        Graph::connect(config)
            .await
            .unwrap()
        );

    let semaphore = Arc::new(Semaphore::new(1024));
    let mut acc: usize = 0;
    loop {
        let permit = semaphore.clone();
        let _permit = permit.acquire_owned().await.unwrap();

        let cloned_graph = graph.clone();
        let cloned_acc = acc.to_string();

        tokio::spawn(async move {
            let temp = cloned_graph.run(neo4rs::query(&format!("MERGE (n:Num {{num: '{cloned_acc}'}});"))).await;
            match temp {
                Ok(_) => (),
                Err(err) => println!("{err:?}"),
            }
            std::mem::drop(_permit);
        });
        acc += 1;
    }
}
knutwalker commented 1 year ago

@DataFrogman what system are you running on? And what are the versions of Neo4j and neo4rs? How are you connecting to Neo4j? Are you running in release mode?

So far, I cannot reproduce this on a mac. I get a few Connection reset by peer at the beginning, but once a successful connection is made, I can run with steady 1024 live connections.

I tested a slightly modified version of your code:

use std::sync::{atomic::AtomicUsize, Arc};

use neo4rs::*;
use tokio::sync::Semaphore;

#[tokio::test(flavor = "multi_thread", worker_threads = 8)]
async fn concurrency() -> Result<()> {
    let uri = "bolt://127.0.0.1:7687";
    let user = "neo4j";
    let password = "neo4j";

    let config = ConfigBuilder::default()
        .uri(uri)
        .user(user)
        .password(password)
        .max_connections(10240)
        .build()
        .unwrap();

    let graph = Arc::new(Graph::connect(config).await.unwrap());

    let semaphore = Arc::new(Semaphore::new(1024));
    let mut acc: usize = 0;
    let connections = Arc::new(AtomicUsize::new(0));
    let successes = Arc::new(AtomicUsize::new(0));
    let errors = Arc::new(AtomicUsize::new(0));

    tokio::spawn({
        let c = connections.clone();
        let s = successes.clone();
        let e = errors.clone();
        async move {
            loop {
                tokio::time::sleep(tokio::time::Duration::from_secs(3)).await;
                println!(
                    "live connections: {} successes: {} errors: {}",
                    c.load(std::sync::atomic::Ordering::Relaxed),
                    s.load(std::sync::atomic::Ordering::Relaxed),
                    e.load(std::sync::atomic::Ordering::Relaxed),
                );
            }
        }
    });

    loop {
        let permit = semaphore.clone();
        let _permit = permit.acquire_owned().await.unwrap();

        let connections = connections.clone();
        let successes = successes.clone();
        let errors = errors.clone();
        let cloned_graph = graph.clone();
        let cloned_acc = acc.to_string();

        tokio::spawn(async move {
            connections.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
            let temp = cloned_graph
                .run(neo4rs::query(&format!(
                    "MERGE (n:Num {{num: '{cloned_acc}'}});"
                )))
                .await;
            match temp {
                Ok(_) => {
                    successes.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
                }
                Err(err) => {
                    match err {
                        Error::UnexpectedMessage(msg)
                        | Error::UnknownMessage(msg)
                        | Error::AuthenticationError(msg) => {
                            println!("error: {}", msg);
                        }
                        _ => {}
                    };
                    errors.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
                }
            }
            connections.fetch_sub(1, std::sync::atomic::Ordering::Relaxed);
            std::mem::drop(_permit);
        });
        acc += 1;
    }
}

and I ran this from master against Neo4j 5.12.0 on localhost running on the same machine with the following config changes:

server.memory.heap.max_size=8g
server.memory.pagecache.size=8g
dbms.memory.transaction.total.max=6g
db.memory.transaction.max=16m
db.transaction.concurrent.maximum=102400
server.bolt.thread_pool_max_size=102400

and I let it running for 10-something minutes with the following output

$ cargo test --release -- concurrency --nocapture
[...]
running 1 test
live connections: 1024 successes: 2482 errors: 2782
live connections: 1024 successes: 5634 errors: 2782
live connections: 1024 successes: 8924 errors: 2782
live connections: 1024 successes: 12151 errors: 2782
live connections: 1024 successes: 15078 errors: 2782
live connections: 1024 successes: 18044 errors: 2782
live connections: 1024 successes: 21195 errors: 2782
live connections: 1024 successes: 24220 errors: 2782
live connections: 1024 successes: 27312 errors: 2782
live connections: 1024 successes: 30756 errors: 2782
live connections: 1024 successes: 33681 errors: 2782
live connections: 1024 successes: 36906 errors: 2782
live connections: 1024 successes: 39876 errors: 2782
live connections: 1024 successes: 43045 errors: 2782
live connections: 1024 successes: 45928 errors: 2782
live connections: 1024 successes: 48940 errors: 2782
live connections: 1024 successes: 51904 errors: 2782
live connections: 1024 successes: 54998 errors: 2782
live connections: 1024 successes: 58064 errors: 2782
test concurrency has been running for over 60 seconds
live connections: 1024 successes: 61239 errors: 2782
live connections: 1024 successes: 64387 errors: 2782
live connections: 1024 successes: 67424 errors: 2782
live connections: 1024 successes: 69810 errors: 2782
live connections: 1024 successes: 72107 errors: 2782
live connections: 1024 successes: 74220 errors: 2782
live connections: 1024 successes: 76472 errors: 2782
live connections: 1024 successes: 78694 errors: 2782
live connections: 1024 successes: 80683 errors: 2782
live connections: 1024 successes: 82934 errors: 2782
live connections: 1024 successes: 85173 errors: 2782
live connections: 1019 successes: 87354 errors: 2782
live connections: 1024 successes: 89504 errors: 2782
live connections: 1024 successes: 91622 errors: 2782
live connections: 1024 successes: 93718 errors: 2782
live connections: 1024 successes: 95256 errors: 2782
live connections: 1024 successes: 97135 errors: 2782
live connections: 1015 successes: 99072 errors: 2782
live connections: 1024 successes: 101024 errors: 2782
live connections: 1024 successes: 102911 errors: 2782
live connections: 1024 successes: 104884 errors: 2782
live connections: 1024 successes: 106801 errors: 2782
live connections: 1024 successes: 108385 errors: 2782
live connections: 1024 successes: 110251 errors: 2782
live connections: 1024 successes: 112070 errors: 2782
live connections: 1024 successes: 113631 errors: 2782
live connections: 1024 successes: 115451 errors: 2782
live connections: 1024 successes: 116871 errors: 2782
live connections: 1024 successes: 118761 errors: 2782
live connections: 1024 successes: 120468 errors: 2782
live connections: 1024 successes: 122003 errors: 2782
live connections: 1024 successes: 123669 errors: 2782
live connections: 1024 successes: 125696 errors: 2782
live connections: 1024 successes: 127491 errors: 2782
live connections: 1024 successes: 128947 errors: 2782
live connections: 1024 successes: 130604 errors: 2782
live connections: 1024 successes: 132186 errors: 2782
live connections: 1024 successes: 133591 errors: 2782
live connections: 1024 successes: 134724 errors: 2782
live connections: 1024 successes: 136636 errors: 2782
live connections: 1024 successes: 138199 errors: 2782
live connections: 1024 successes: 140030 errors: 2782
live connections: 1024 successes: 141380 errors: 2782
live connections: 1017 successes: 143184 errors: 2782
live connections: 1024 successes: 144554 errors: 2782
live connections: 1024 successes: 145557 errors: 2782
live connections: 1024 successes: 147366 errors: 2782
live connections: 1024 successes: 148763 errors: 2782
live connections: 1024 successes: 149783 errors: 2782
live connections: 1024 successes: 151038 errors: 2782
live connections: 1024 successes: 152477 errors: 2782
live connections: 1024 successes: 153957 errors: 2782
live connections: 1024 successes: 155427 errors: 2782
live connections: 1024 successes: 157129 errors: 2782
live connections: 1024 successes: 158402 errors: 2782
live connections: 1024 successes: 159445 errors: 2782
live connections: 1024 successes: 161356 errors: 2782
live connections: 1024 successes: 162491 errors: 2782
live connections: 1024 successes: 163981 errors: 2782
live connections: 1024 successes: 165546 errors: 2782
live connections: 1024 successes: 166503 errors: 2782
live connections: 1024 successes: 167422 errors: 2782
live connections: 1024 successes: 168735 errors: 2782
live connections: 1024 successes: 168897 errors: 2782
live connections: 1024 successes: 169463 errors: 2782
live connections: 1024 successes: 170627 errors: 2782
live connections: 1024 successes: 172011 errors: 2782
live connections: 1024 successes: 173317 errors: 2782
live connections: 1024 successes: 174291 errors: 2782
live connections: 1024 successes: 175334 errors: 2782
live connections: 1024 successes: 177362 errors: 2782
live connections: 1024 successes: 177652 errors: 2782
live connections: 1024 successes: 179147 errors: 2782
live connections: 1024 successes: 180416 errors: 2782
live connections: 1024 successes: 181581 errors: 2782
live connections: 1024 successes: 182604 errors: 2782
live connections: 1024 successes: 184107 errors: 2782
live connections: 1024 successes: 185119 errors: 2782
live connections: 1024 successes: 186160 errors: 2782
live connections: 1024 successes: 187221 errors: 2782
live connections: 1024 successes: 188560 errors: 2782
live connections: 1024 successes: 189623 errors: 2782
live connections: 1024 successes: 190784 errors: 2782
live connections: 1024 successes: 191855 errors: 2782
live connections: 1024 successes: 193015 errors: 2782
live connections: 1024 successes: 194209 errors: 2782
live connections: 1024 successes: 195249 errors: 2782
live connections: 1024 successes: 196314 errors: 2782
live connections: 1024 successes: 197377 errors: 2782
live connections: 1024 successes: 198758 errors: 2782
live connections: 1024 successes: 199186 errors: 2782
live connections: 1024 successes: 200326 errors: 2782
live connections: 1024 successes: 201536 errors: 2782
live connections: 1024 successes: 202718 errors: 2782
live connections: 1024 successes: 203430 errors: 2782
live connections: 1024 successes: 204432 errors: 2782
live connections: 1024 successes: 205389 errors: 2782
live connections: 1024 successes: 206999 errors: 2782
live connections: 1024 successes: 208068 errors: 2782
live connections: 1024 successes: 208226 errors: 2782
live connections: 1024 successes: 209186 errors: 2782
live connections: 1024 successes: 210620 errors: 2782
live connections: 1024 successes: 211861 errors: 2782
live connections: 1024 successes: 212719 errors: 2782
live connections: 1024 successes: 213386 errors: 2782
live connections: 1024 successes: 214431 errors: 2782
live connections: 1024 successes: 215517 errors: 2782
live connections: 1024 successes: 216593 errors: 2782
live connections: 1024 successes: 217608 errors: 2782
live connections: 1024 successes: 218293 errors: 2782
live connections: 1024 successes: 219103 errors: 2782
live connections: 1024 successes: 220414 errors: 2782
live connections: 1024 successes: 221126 errors: 2782
live connections: 1024 successes: 221970 errors: 2782
live connections: 1024 successes: 222852 errors: 2782
live connections: 1024 successes: 223794 errors: 2782
live connections: 1024 successes: 224691 errors: 2782
live connections: 1024 successes: 225712 errors: 2782
live connections: 1024 successes: 226803 errors: 2782
live connections: 1024 successes: 227782 errors: 2782
live connections: 1024 successes: 228692 errors: 2782
live connections: 1024 successes: 229649 errors: 2782
live connections: 1024 successes: 230616 errors: 2782
live connections: 1024 successes: 231513 errors: 2782
live connections: 1024 successes: 232271 errors: 2782
live connections: 1024 successes: 233549 errors: 2782
live connections: 1024 successes: 234514 errors: 2782
live connections: 1024 successes: 235420 errors: 2782
live connections: 1024 successes: 235895 errors: 2782
live connections: 1024 successes: 236964 errors: 2782
live connections: 1024 successes: 237992 errors: 2782
live connections: 1024 successes: 239065 errors: 2782
live connections: 948 successes: 240029 errors: 2782
live connections: 1024 successes: 240527 errors: 2782
live connections: 1024 successes: 241545 errors: 2782
live connections: 1024 successes: 242712 errors: 2782
live connections: 1024 successes: 243783 errors: 2782
live connections: 1024 successes: 244616 errors: 2782
live connections: 1024 successes: 245380 errors: 2782
live connections: 1024 successes: 245952 errors: 2782
live connections: 1024 successes: 246987 errors: 2782
live connections: 1024 successes: 247895 errors: 2782
live connections: 1024 successes: 248726 errors: 2782
live connections: 1024 successes: 249778 errors: 2782
live connections: 1024 successes: 250886 errors: 2782
live connections: 1024 successes: 251401 errors: 2782
live connections: 1024 successes: 252622 errors: 2782
live connections: 1024 successes: 253554 errors: 2782
live connections: 1024 successes: 253972 errors: 2782
live connections: 1024 successes: 255028 errors: 2782
live connections: 1024 successes: 255815 errors: 2782
live connections: 1024 successes: 257079 errors: 2782
live connections: 1024 successes: 257515 errors: 2782
live connections: 1024 successes: 258510 errors: 2782
live connections: 1024 successes: 259450 errors: 2782
live connections: 1024 successes: 260086 errors: 2782
live connections: 1024 successes: 260892 errors: 2782
live connections: 1024 successes: 261821 errors: 2782
live connections: 1024 successes: 262506 errors: 2782
live connections: 1024 successes: 263492 errors: 2782
live connections: 1024 successes: 264350 errors: 2782
live connections: 1024 successes: 264902 errors: 2782
live connections: 1024 successes: 266098 errors: 2782
live connections: 1024 successes: 267078 errors: 2782
live connections: 1024 successes: 267537 errors: 2782
live connections: 1024 successes: 268201 errors: 2782
live connections: 1024 successes: 268980 errors: 2782
live connections: 1024 successes: 270016 errors: 2782
live connections: 1024 successes: 270796 errors: 2782
live connections: 1024 successes: 271906 errors: 2782
live connections: 1024 successes: 272526 errors: 2782
live connections: 1024 successes: 272958 errors: 2782
live connections: 1024 successes: 273560 errors: 2782
live connections: 1024 successes: 274582 errors: 2782
live connections: 1024 successes: 275452 errors: 2782
live connections: 1024 successes: 276533 errors: 2782
live connections: 1024 successes: 277157 errors: 2782
live connections: 1024 successes: 277637 errors: 2782
live connections: 1024 successes: 278668 errors: 2782
live connections: 1024 successes: 279679 errors: 2782
live connections: 1024 successes: 279844 errors: 2782
live connections: 1024 successes: 280869 errors: 2782
live connections: 1024 successes: 281466 errors: 2782
live connections: 1024 successes: 282302 errors: 2782
live connections: 1024 successes: 282614 errors: 2782
live connections: 1024 successes: 283240 errors: 2782
live connections: 1024 successes: 283946 errors: 2782
live connections: 1024 successes: 284875 errors: 2782
live connections: 1024 successes: 284997 errors: 2782
live connections: 1024 successes: 285868 errors: 2782
live connections: 1024 successes: 287062 errors: 2782
live connections: 1024 successes: 287280 errors: 2782
live connections: 1024 successes: 288320 errors: 2782
live connections: 1024 successes: 288855 errors: 2782
live connections: 1024 successes: 289759 errors: 2782
live connections: 1024 successes: 289994 errors: 2782
live connections: 1024 successes: 290572 errors: 2782
live connections: 1024 successes: 291559 errors: 2782
live connections: 1024 successes: 291914 errors: 2782
live connections: 1024 successes: 292905 errors: 2782
live connections: 1024 successes: 293585 errors: 2782
live connections: 1024 successes: 294612 errors: 2782
live connections: 1024 successes: 294731 errors: 2782
live connections: 1024 successes: 295554 errors: 2782
live connections: 1024 successes: 296489 errors: 2782
live connections: 1024 successes: 296584 errors: 2782
live connections: 1024 successes: 297633 errors: 2782
live connections: 1021 successes: 297761 errors: 2782
live connections: 1024 successes: 298562 errors: 2782
live connections: 1024 successes: 299389 errors: 2782
live connections: 1024 successes: 300215 errors: 2782
live connections: 1024 successes: 300711 errors: 2782
live connections: 1024 successes: 301459 errors: 2782
live connections: 1024 successes: 302018 errors: 2782
live connections: 1024 successes: 302920 errors: 2782
live connections: 1024 successes: 303850 errors: 2782
live connections: 1024 successes: 304436 errors: 2782
live connections: 1024 successes: 305078 errors: 2782
live connections: 1024 successes: 305779 errors: 2782
live connections: 1024 successes: 306570 errors: 2782
live connections: 1024 successes: 306887 errors: 2782
live connections: 1024 successes: 307668 errors: 2782
live connections: 1024 successes: 308699 errors: 2782
live connections: 1024 successes: 309474 errors: 2782
live connections: 1024 successes: 309531 errors: 2782
live connections: 1024 successes: 310539 errors: 2782
live connections: 1024 successes: 311181 errors: 2782
live connections: 1024 successes: 311734 errors: 2782
live connections: 1024 successes: 312837 errors: 2782
live connections: 1024 successes: 312918 errors: 2782
live connections: 1024 successes: 313905 errors: 2782
^C

Are you running Neo4j in docker, perhaps? I can imagine that a different network stack due to the os or network connection can have an influence on the behavior