Open worksofliam opened 3 months ago
I wonder what results you'd get for ODBC if you increased the Node.js worker thread pool size using this:
https://nodejs.org/docs/latest/api/cli.html#uv_threadpool_sizesize
node-odbc has a known issue with running concurrent queries due to running synchronous APIs in the worker thread pool, as discussed here:
https://github.com/IBM/node-odbc/issues/169
So, it can only run 4 queries at once unless the thread pool size is increased.
@DavidRusso I get similar results if I set that environment variables:
$ UV_THREADPOOL_SIZE=10 npm run start odbc p 10
node-odbc - Pool-Promise.all - 10 queries
Node.js version: v20.14.0
Platform: darwin
Architecture: arm64
SQL used:
VALUES (JOB_NAME)
Pool startup:
Start size: 5
Max size: 5
Time: -337.99975ms
Total queries: 10
Total time: 15651.362373999998ms
Average time: 1565.1362373999998ms
Fastest query: 1110.187333ms (query 1)
Slowest query: 2002.8617909999998ms (query 10)
Keynote chart:
0 1 2 3 4 5 6 7 8 9
1110.187333 1110.232625 1444.363667 1445.589708 1446.054417 1769.985583 1771.1829579999999 1771.140125 1779.764167 2002.8617909999998
Edit: if I set to 64, there is some little improvement:
$ UV_THREADPOOL_SIZE=64 npm run start odbc p 10
Total queries: 10
Total time: 11238.879836ms
Average time: 1123.8879836ms
Fastest query: 655.5210420000001ms (query 2)
Slowest query: 1550.866542ms (query 10)
Keynote chart:
0 1 2 3 4 5 6 7 8 9
657.2134169999999 655.5210420000001 1009.2506250000001 1009.800084 1010.1927079999999 1336.025 1336.67875 1336.605459 1336.726209 1550.866542
@worksofliam , I see.
Another problem that I've noticed with ODBC is that the data compression option for the IBM i driver doesn't seem to work. Even if you set it on the connection string, you get the same result as without when transferring large data over a WAN link.
Anyhow, the new service seems really promising! It's great to have a new alternative for connectivity from Node.js, as all the current options have some issues.
@DavidRusso I have had issues with node-odbc for some time and have been wanting an improved clients, more similar to node-mssql
or pg
for quite some time. We're headed that direction for sure.
@DavidRusso I found a large issue with ODBC pooling, which is where the issues are in this blog. Another post will come to address this.
Of course the day we release Mapepire, the new Db2 for IBM i database clients, is the same day we get asked if there are any comparisons against ODBC. Previously we weren't particularly focused on performance compared to ODBC. Our primary goal with Mapepire was going to be developer and deploy friendliness, which we have achieved easily because it's so easy to get going with.
Over the last 24 hours or so I put together a neat test suite to compare single job query performance and pool query performance. The results are pretty crazy at a first glance. I have also made this testing suite available to the public so the tests can be run on other systems.
Setting up the test suite
The test suite is easy. After you've cloned the repository, setup the environment variables as defined in
.env.sample
in a.env
file and then you can run the script usingnpm run start
with some parameters.I am using the sample schema provided in Db2 for IBM i. You can create it with the following SQL statement:
Client details
If you run
./report.sh
, then it will run all 8 possible scenarios and write all the results toreport.txt
. I am using that generated report to provide the information in this blog. You can find the report I generated here.Let's start with some client details:
All tests were run against the same IBM i 7.4 system.
Test scenarios
In total there are eight possible scenarios to test ODBC and Mapepire with the test suite today.
Connection types
Execution types
Promise.all
to let the event loop handle the results from the database execution. This sends all requests 'at once' (pooling in theory is supposed to handle distributing the requests to the database across the pool of jobs) and will return when the result comes back from the server.for
loop and run a single request at a time.A note about pools
We are using a max size of 5 and a starting size of 5 for both ODBC and Mapepire. In the ODBC
pool
constructor, we are passing inreuseConnections: true
. Mapepire does this by default.SQL statement used
The test running is using a simple select statement for each test:
When I was testing with statements that return larger result sets (like
select * from sample.syscolumns
), the ODBC response times were very high and it didn't look good on a chart. You are welcome to try that yourself.Parsing the result
Each test case has a report. We are primarily focused on this part of the report for each case:
Here is some more detail about this information:
total time / count
You will also find in your reports Keynote chart information, which is what is used to produce the charts in this blog. The chart data for each report is a table of requests and how long each request took to execute and finish.
Analysing the data
See a the report used for this blog here.
For this blog, I tried 50 requests for each different scenario. Using the chart data generated in the reports, we can cycle through the different test reports to produce high level reports of both ODBC and Mapepire against each other.
Know that, as you are seeing these results, that the test runner does make sure that results come back from the database in both database clients.
This chart is a little painful to look at. Initial glance is that ODBC pooling is still synchronous somewhere and perhaps isn't distributing requests between the pool of jobs very well. Requests are getting slower as they queue up. Mapepire on the other hand is taking care of the 50 requests with a pretty consistant response time from the database.
ODBC pulled through better in this situation. I suspect the spikes in both ODBC and Mapepire here might be from a fresh job running the SQL statement for the first time, and therefore not using the database cache. The cool part here is that Mapepire still generally has faster responses time.
Once against ODBC struggling when using a single job to send all requests. Mapepire handles it very well. This might be due to the way that Mapepire queues up requests in a single job.
ODBC once against shining when executing a single statement at a time, but is still no match for Mapepire in this instance. I am not sure what would even cause this spikes. It seems like Mapepires faster response time is really working out in single job performance.
General overview and why?
It seems like Mapepire wins for pooling and single-job performance. ODBC does fine, not with single job performance or pooling, but in more synchronous processing. There could be multiple reasons for this. ODBC is a native driver, written mostly in C++ using the N-API, so it can talk to the native, platform specific ODBC driver. That possibly has some overhead.
Mapepire (mapepire-js in this instance) is almost dumber, since it is an all JS client and uses a simpler protocol (secured websockets) to talk to the database server. It doesn't manage anything to do with encodings, passwords, and other stuff that is rammed in jtopen/jt400.