voltrondata / sqlflite

An example Flight SQL Server implementation - with DuckDB and SQLite back-ends.
Apache License 2.0
205 stars 23 forks source link

Flight SQL is slower #29

Closed BrandonNgwa closed 7 months ago

BrandonNgwa commented 8 months ago

I have recently been running flight sql API to do some database querying on my ubuntu version 20.04.6 lts sever machine. Following all the steps on this repository I managed to build the executable using cmake and ran some queries with duckdb. The first time I did this was with the flight sql version 13 and duckdb version 0.8.0. Everything ran well in the end and the querying speeds were really fast compared to traditional ODBC with MySQL server. Now I have upgraded my ubuntu version to 22.04.3 lts to use with the new version 15 of flight sql. This new version is also using a newer version duckdb, 0.9.0. I downloaded the executable for linux from this repository and ran it. Doing the same queries I did previously I noticed slower speeds with this new version. I don't know if it has to do with the newer version of duckdb or ubuntu or is something else entirely. I also released this executable is far heavy than the one I built with cmake for version 13. My version 13 flight sql was 2.4mb in size but this newer already compiled executable is about 80mb in size, a big difference. Any ideas as to why things are slower now with this upgrades? Am now getting the same speeds sometimes even slower speeds than using ODBC with MySQL server when querying.

prmoore77 commented 8 months ago

Hey @BrandonNgwa - what kind of queries are you running? Are they large OLAP-style queries? Do you fetch a lot of data, or just a few rows?

We switched to a statically linked executable recently (like DuckDB has) - that builds all of the dependencies into it (Arrow, DuckDB, SQLite, etc.) - which makes it larger. The plus side is that you don't have to have any of the dependencies installed, the downside is a larger executable.

We'll run some benchmarks on my side to see how tpc-h performs...

BrandonNgwa commented 8 months ago

Hello prmoore77, thanks for your reply. I don't think am running any OLAP-style queries but yes I am fetching large amounts of data. Am using a simple SELECT statement with a LIMIT clause to retrieve a certain number of records in batches from from a table in my duckdb database. Then I record the time it takes to get each batch. This are the batches, [ 100000, 500000, 1000000, 2000000, 3000000, 4000000, 5000000, 6000000, 7000000, 8000000, 9000000, 10000000]. I have ran this a couple of times since flight sql version 12, speeds where good but I don't know why now its a bit slower with the upgrades.
To give you an example here are my execution times

This when I ran the script on my ubuntu server machine where everything is installed. This was using flight sql version 13 with all the dependencies and on ubuntu 20.04.6 lts

Query for 100000 records executed in 0.78 seconds Query for 500000 records executed in 2.78 seconds Query for 1000000 records executed in 2.74 seconds Query for 2000000 records executed in 4.21 seconds Query for 3000000 records executed in 5.63 seconds Query for 4000000 records executed in 7.32 seconds Query for 5000000 records executed in 9.14 seconds Query for 6000000 records executed in 9.71 seconds Query for 7000000 records executed in 11.48 seconds Query for 8000000 records executed in 13.46 seconds Query for 9000000 records executed in 15.12 seconds Query for 10000000 records executed in 16.99 seconds

This are the speeds now having upgraded my ubuntu to 22.04.3 to run with the new flight sql executable plus all the dependencies.

N records Run 1 Run 2 Run 3 Run 4 Run 5 Average 100000 2.18 2.13 2.2 1.69 2.01 2.04 500000 3.6 2.94 2.85 2.66 2.82 2.97 1000000 7.59 4.05 4.39 4.64 5.55 5.24 2000000 24.69 21.5 21.9 20.78 22.03 22.18 3000000 27.66 28.15 28.2 27.62 27.93 27.91 4000000 30.09 29.29 29.43 28.4 28.96 29.23 5000000 30.77 31.45 30.66 29.81 30.02 30.54 6000000 32.44 32.74 32.67 32.93 31.2 32.40 7000000 38.77 35.49 35.97 37.08 35.94 36.65 8000000 43.16 42.98 43.96 44.16 40.56 42.96 9000000 52.25 47.45 45.07 47.09 44.65 47.30 10000000 52.86 52.85 51.09 50.84 50.22 51.57

Don't know why thinks are slower now. Also I get now why the executable is much bigger. Overall is an advantage since you don't have to install the dependencies now.

prmoore77 commented 8 months ago

Hi - a few questions:

Thanks.

BrandonNgwa commented 8 months ago

I log into my remote sever from my pc using ssh and run all the queries on the server itself. The network speeds were mostly the same for when I was testing with the old versions and with the newer versions. I am using adbc-driver-flightsql 0.9.0 Yes the test were ran on the same infrastructure with my server machine having the same exact specs.

The main thing that changed was I that I upgrade from ubuntu 20. to 22. to be able to use the new executable of flightsql

prmoore77 commented 8 months ago

I tested TPC-H SF1k on my end - and I didn't find performance degradation from my previous tests (before static linking). That being said, I don't fetch large result sets.

There could be a slow down in some of the various components related to fetching perhaps - in the Arrow Flight server side, or the ADBC client, perhaps that you are observing.

I'll attempt to benchmark that on an older version of the server executable to see the impact on my end. It may take a while, however - as I have some other urgent priorities at present... Thanks for reporting the issue.

BrandonNgwa commented 8 months ago

No problem. Thank you for all the feedback. I will run some more test on my side to try to narrow down the problem.

BrandonNgwa commented 8 months ago

Hello prmoore77,

When ever your free please take a look this documentation below of the test I've been running. It will give you a better understanding of the issue we are facing.

Timing.docx

prmoore77 commented 7 months ago

hi @BrandonNgwa - a lot has changed (Arrow and DuckDB versions, etc.) in the codebase since this issue was opened. I will close this one for now. Please open another one if you continue to have the issue. Thanks.