Closed Maximkaaa closed 3 years ago
sh2pgsql seems to use shapelib, so most of reading performance would comme from here I think. https://github.com/OSGeo/shapelib/blob/21ae8fc16afa15a1b723077b6cec3a9abc592f6a/dbfopen.c#L946
However by looking at the code I don't see anything extra ordinary which makes me think thats dbase-rs is doing something bad.
The best way would be to profile the code, however I'm on Windows and last time I tried to profile Rust code it was painful, i'll see If can get some results or boot into a linux but that probably won't be after a few weeks
One notable thing I see is the fact that they use a buffer to store the current record they are parsing: Read the whole record bytes into an in memory buffer, then the functions that read a record fields look into that buffer.
In dbase-rs, fields are read one by one the buffering is handled by the BufReader
but maybe that is not enough, and we should have a buffer that holds the current Record.
An very quick profiling seems to tell me that we spend an lot of time in the read_exact
function of the BufReader<File>
, so that may be worth to try.
I think we can close this issue. Even if there are further improvements possible, currently the main limiting factor for most practical applications is IO speed. @tmontaigu Thanks so much for your help! Do you plan to release a new version of the crate?
Yep
As a follow-up for #21 , I've checked how long it takes for other applications to deal with large shapefiles (with dbfs).
It takes
shp2pgsql
less then 6 sec to read a shapefile with 500 MB dbf attribute data. This is around 4 times faster then just reading dbf records with dbase-rs (after applying #21 ). It does not use multithreading or other dirty hacks... So there are clearly ways to improve performance.@tmontaigu If you are interested in improving this crate, I'm willing to invest some time in investigating and trying different ideas.
Probably, someone with some knowledge of C can help with producing such ideas from shp2pgsql source (https://postgis.net/docs/doxygen/3.2/d8/da3/shp2pgsql-core_8c_source.html).