sfu-db / connector-x

Fastest library to load data from DB to DataFrames in Rust and Python
https://sfu-db.github.io/connector-x
MIT License
1.98k stars 158 forks source link

Oracle performance #127

Open wangxiaoying opened 3 years ago

wangxiaoying commented 3 years ago

Investigate oracle's performance issue

wangxiaoying commented 3 years ago

Bottle neck is fetching the next row from oracle when the driver invoke the odpi. Release row will take a certain mount of time, will try to improve this part by making free in parallel in 0.2.2.

wangxiaoying commented 2 years ago

Making the deallocation in parallel will not improve the performance since releasing the SqlValue (https://github.com/kubo/rust-oracle/blob/master/src/sql_value.rs#L1342) in rust-oracle would block the entire connection (in allocation of fetching more results) by acquiring a lock when set the reference count (https://github.com/oracle/odpi/blob/main/src/dpiGen.c#L252) in odpi.

image

According to the flame graph above, the bottleneck is caused by the underlying driver in dpiVar__allocate and dpiGen__release. So based on the same driver we cannot improve the performance on Oracle.

Please let us know if you know there is another Oracle driver has better performance, or you know how to improve the performance on Oracle in this case!

ahusnn commented 2 years ago

Could this be aided by https://docs.rs/oracle/0.5.3/oracle/struct.Batch.html?

dovahcrow commented 2 years ago

@ahusnn It seems like Batch is for insert/update/delete?

ahusnn commented 2 years ago

@dovahcrow Indeed, I read through that doc too quickly.

It looks like the default fetch size is 100. Would increasing that be helpful, or just diminishing returns on performance for this particular driver?

https://docs.rs/oracle/0.5.3/oracle/enum.StmtParam.html https://docs.rs/oracle/0.5.3/oracle/struct.StatementBuilder.html

wangxiaoying commented 2 years ago

@ahusnn Interesting. Currently we directly issue the plain query instead of using a prepared statement. We can try to test whether using prepared statement and increasing the fetch size works. In the meanwhile, please feel free to test it yourself and share the result with us!

wangxiaoying commented 2 years ago

230, increasing array size to 1K could improve the speed slightly (~ 8%) without partitioning.