Mapepire: Node.js performance testing against ODBC

worksofliam commented 2 weeks ago

Of course the day we release Mapepire, the new Db2 for IBM i database clients, is the same day we get asked if there are any comparisons against ODBC. Previously we weren't particularly focused on performance compared to ODBC. Our primary goal with Mapepire was going to be developer and deploy friendliness, which we have achieved easily because it's so easy to get going with.

Over the last 24 hours or so I put together a neat test suite to compare single job query performance and pool query performance. The results are pretty crazy at a first glance. I have also made this testing suite available to the public so the tests can be run on other systems.

Setting up the test suite

The test suite is easy. After you've cloned the repository, setup the environment variables as defined in .env.sample in a .env file and then you can run the script using npm run start with some parameters.

barry@Liams-MBP node-db-perf % npm run start           

> start
> node --env-file=.env index.js

Usage: node index.js <db> <s|sa|p> <count>

I am using the sample schema provided in Db2 for IBM i. You can create it with the following SQL statement:

CALL QSYS.CREATE_SQL_SAMPLE('SAMPLE');

Client details

If you run ./report.sh, then it will run all 8 possible scenarios and write all the results to report.txt. I am using that generated report to provide the information in this blog. You can find the report I generated here.

Let's start with some client details:

Client device: Apple M1 Max, 32 GB memory, macOS 14.5. Results may vary on other operating systems, though I suspect by not much.
Node.js: v20.14.0, darwin, arm64

All tests were run against the same IBM i 7.4 system.

Test scenarios

In total there are eight possible scenarios to test ODBC and Mapepire with the test suite today.

Connection types

Single: Using a single job to run SQL statements against the database. This allows us to test single job performance.
Pool: Using one or more jobs to run SQL statements against the database. This allows us to test how well a pool can handle an influx of requests to the database.

Execution types

Promise.all: Using Node.js Promise.all to let the event loop handle the results from the database execution. This sends all requests 'at once' (pooling in theory is supposed to handle distributing the requests to the database across the pool of jobs) and will return when the result comes back from the server.
For-Await: Instead of building an array of requests, we using a simple for loop and run a single request at a time.

A note about pools

We are using a max size of 5 and a starting size of 5 for both ODBC and Mapepire. In the ODBC pool constructor, we are passing in reuseConnections: true. Mapepire does this by default.

SQL statement used

The test running is using a simple select statement for each test:

select * from sample.department

When I was testing with statements that return larger result sets (like select * from sample.syscolumns), the ODBC response times were very high and it didn't look good on a chart. You are welcome to try that yourself.

Parsing the result

Each test case has a report. We are primarily focused on this part of the report for each case:

Total queries:      1
Total time:         656.565167ms
Average time:       656.565167ms
Fastest query:      656.565167ms (query 1)
Slowest query:      656.565167ms (query 1)

Here is some more detail about this information:

Total time: this is the total length of time that it took for each request to execute and finish. This is not the total time of the test suite running.
Average time: total time / count
Fastest query: which request number was the faster
Slowest query: which request number was the slowest

You will also find in your reports Keynote chart information, which is what is used to produce the charts in this blog. The chart data for each report is a table of requests and how long each request took to execute and finish.

Analysing the data

See a the report used for this blog here.

For this blog, I tried 50 requests for each different scenario. Using the chart data generated in the reports, we can cycle through the different test reports to produce high level reports of both ODBC and Mapepire against each other.

Know that, as you are seeing these results, that the test runner does make sure that results come back from the database in both database clients.

This chart is a little painful to look at. Initial glance is that ODBC pooling is still synchronous somewhere and perhaps isn't distributing requests between the pool of jobs very well. Requests are getting slower as they queue up. Mapepire on the other hand is taking care of the 50 requests with a pretty consistant response time from the database.

ODBC pulled through better in this situation. I suspect the spikes in both ODBC and Mapepire here might be from a fresh job running the SQL statement for the first time, and therefore not using the database cache. The cool part here is that Mapepire still generally has faster responses time.

Once against ODBC struggling when using a single job to send all requests. Mapepire handles it very well. This might be due to the way that Mapepire queues up requests in a single job.

ODBC once against shining when executing a single statement at a time, but is still no match for Mapepire in this instance. I am not sure what would even cause this spikes. It seems like Mapepires faster response time is really working out in single job performance.

General overview and why?

It seems like Mapepire wins for pooling and single-job performance. ODBC does fine, not with single job performance or pooling, but in more synchronous processing. There could be multiple reasons for this. ODBC is a native driver, written mostly in C++ using the N-API, so it can talk to the native, platform specific ODBC driver. That possibly has some overhead.

Mapepire (mapepire-js in this instance) is almost dumber, since it is an all JS client and uses a simpler protocol (secured websockets) to talk to the database server. It doesn't manage anything to do with encodings, passwords, and other stuff that is rammed in jtopen/jt400.

DavidRusso commented 1 week ago

I wonder what results you'd get for ODBC if you increased the Node.js worker thread pool size using this:

https://nodejs.org/docs/latest/api/cli.html#uv_threadpool_sizesize

node-odbc has a known issue with running concurrent queries due to running synchronous APIs in the worker thread pool, as discussed here:

https://github.com/IBM/node-odbc/issues/169

So, it can only run 4 queries at once unless the thread pool size is increased.

worksofliam commented 1 week ago

@DavidRusso I get similar results if I set that environment variables:

$ UV_THREADPOOL_SIZE=10 npm run start odbc p 10

node-odbc - Pool-Promise.all - 10 queries

Node.js version: v20.14.0
Platform:        darwin
Architecture:    arm64

SQL used:
        VALUES (JOB_NAME)

Pool startup:
        Start size: 5
        Max size:   5
        Time:       -337.99975ms

Total queries:      10
Total time:         15651.362373999998ms
Average time:       1565.1362373999998ms
Fastest query:      1110.187333ms (query 1)
Slowest query:      2002.8617909999998ms (query 10)

Keynote chart:
0       1       2       3       4       5       6       7       8       9
1110.187333     1110.232625     1444.363667     1445.589708     1446.054417     1769.985583     1771.1829579999999      1771.140125     1779.764167     2002.8617909999998

Edit: if I set to 64, there is some little improvement:

$ UV_THREADPOOL_SIZE=64 npm run start odbc p 10

Total queries:      10
Total time:         11238.879836ms
Average time:       1123.8879836ms
Fastest query:      655.5210420000001ms (query 2)
Slowest query:      1550.866542ms (query 10)

Keynote chart:
0       1       2       3       4       5       6       7       8       9
657.2134169999999       655.5210420000001       1009.2506250000001      1009.800084     1010.1927079999999      1336.025        1336.67875      1336.605459     1336.726209     1550.866542

DavidRusso commented 1 week ago

@worksofliam , I see.

Another problem that I've noticed with ODBC is that the data compression option for the IBM i driver doesn't seem to work. Even if you set it on the connection string, you get the same result as without when transferring large data over a WAN link.

Anyhow, the new service seems really promising! It's great to have a new alternative for connectivity from Node.js, as all the current options have some issues.

worksofliam commented 1 week ago

@DavidRusso I have had issues with node-odbc for some time and have been wanting an improved clients, more similar to node-mssql or pg for quite some time. We're headed that direction for sure.

worksofliam commented 1 week ago

@DavidRusso I found a large issue with ODBC pooling, which is where the issues are in this blog. Another post will come to address this.

worksofliam / blog