tmontaigu / dbase-rs

Rust library to read & write dBase files.
MIT License
29 stars 30 forks source link

Improves performance of reading large dbfs #21

Closed Maximkaaa closed 3 years ago

Maximkaaa commented 3 years ago

This commit improves the way the values are read from the dbf files improving the speed of the file processing by almost 3 times.

The key points of improvement:

Overall in my case reading a 500 MiB dbf file takes:

tmontaigu commented 3 years ago

Interesting

I'd like to find a 500Mb file to test out those improvements.

Removes Vector::resize() call in read_string_of_len function. Using vec![0; len] instead gives significant improvement.

Seems strange, I would have expected to two to be somewhat equivalent but that's cool

Maximkaaa commented 3 years ago

I did some more performance testing just to be sure. For it i used two files: 500 MB and 39 MB. The test code is just:

#[test]
fn large() {
    let mut reader = Reader::from_path("./tests/data/gis_osm_water_a_free_1.dbf").unwrap();
    let start = std::time::SystemTime::now();
    reader.read();
    let ellapsed = start.elapsed().unwrap().as_millis();
    assert_eq!(0, ellapsed);
}

The results are:

                                39 MB       500 MB

Without modification            1067 ms     69513 ms
.resize() -> vec![0; len]       851 ms      63046 ms
All improvements                521 ms      21756 ms

Both release and debug builds show similar performance improvement.

Unfortunately, I cannot provide the 500 MB file I use as it's commercial data. But the smaller one I downloaded from Geofabric OSM download. There are larger ones also, so you can probably find some files for testing (https://download.geofabrik.de/europe.html).

Maximkaaa commented 3 years ago

I also updated the PR to make fmt check pass. And only after that I noticed that it fails in the file I didn't originally change.