rust-lang / glob

Support for matching file paths against Unix shell style patterns.
http://doc.rust-lang.org/glob
Apache License 2.0
447 stars 76 forks source link

Cache information about file type #135

Closed Kobzol closed 6 months ago

Kobzol commented 6 months ago

This commit adds a cache that remembers whether a given path is a file or a directory, based on the results of std::fs::read_dir. This reduces the number of executed syscalls and improves the performance of the library.

Here is a simple benchmark that uses glob to find the amount of Rust files in the tests directory of a rustc checkout.

fn main() {
    let count = glob::glob("<rustc-root>/tests/**/*.rs")
        .unwrap()
        .count();
    println!("File count: {count}");
}

Results on my PC (approximately 19k Rust files are in that directory):

Version Syscall count statx syscall count Time
Before 41586 34468 ~130ms
After 7131 11 ~70ms

Syscalls were measured with strace <program> 2> out.txt && cat out.txt | wc -l and time was measured using hyperfine.

Fixes: https://github.com/rust-lang/glob/issues/79

This pull request was created in cooperation with students of the Rust course on the VSB-TUO university.