meilisearch / heed

A fully typed LMDB wrapper with minimum overhead 🐦
https://docs.rs/heed
MIT License
583 stars 55 forks source link

Correct way to close an env? #280

Open guozirui33 opened 1 week ago

guozirui33 commented 1 week ago

Hello! What is the recommended way to close an env such that all file handles are properly closed/deleted?

I was running some benchmarks involving heed and ran into "too many open files". So after some digging, I distilled my code down to this test:

#[test]
fn drop_env_deletes_file_handles() {
    let pid = std::process::id();
    let old_num_fds = num_fds(pid);

    let tmp_dir = tempdir::TempDir::new("test").unwrap();
    let lmdb_path = tmp_dir.path().join("testdb");
    let lmdb_map_size = 10 * 1024 * 1024 * 1024;

    std::fs::create_dir_all(&lmdb_path).unwrap();
    let env = unsafe {
        heed::EnvOpenOptions::new()
            .max_dbs(7)
            .map_size(lmdb_map_size)
            .open(&lmdb_path)
            .unwrap()
    };

    drop(env);
    drop(tmp_dir);

    assert_eq!(num_fds(pid), old_num_fds);
}

fn num_fds(pid: u32) -> usize {
    std::fs::read_dir(format!("/proc/{pid}/fd"))
        .unwrap()
        .count()
}

Much to my surprise, this test fails!

I know I can work around this by calling env.prepare_for_closing().wait(); instead of drop(env).

However, just to make sure I understood the library correctly, is drop(env) not supposed to close file handles? Or is this potentially a bug?

Thanks!

Kerollmops commented 1 week ago

Hey @guozirui33 👋

The env is kept in memory so that if you try to open it multiple times in multiple threads, it opens it only once. Opening the same env multiple times in a program is disallowed with LMDB.

So, yeah, using the prepare_for_closing and waiting for it is the way to go. That's the only way when you share the env across threads by using an Arc internally.

Have a nice day 🌞