oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.25k stars 1.07k forks source link

Too many open pool idle connection #1465

Closed amitagarwal-dev closed 2 years ago

amitagarwal-dev commented 2 years ago

HI,
I am doing load testing of my nodejs application which is using oracle as database.
Currently i configured 1500 process and 1800 session in v$parameter in oracle to serve 2k concurrent user.
After running my loadtest, i can see around 1400 approx inactive session in db which is consuming my 90% of my ram.
In my oracle machine, i can see 1400 active oracle process via ps -aux | grep orcl | wc-l .
In my nodejs app machine also i can see 1400 file descriptors active with my nodejs process.
All those fd's are socket fd's which is still connected to my oracle server after load test.

It looks like 1400 idle connection are present in my pool after load test. My pool is not shrinking.
As per the oracledb documentation,

3.3.1.1.14 poolTimeout

Number poolTimeout

The number of seconds after which idle connections (unused in the pool) may be terminated. Idle connections are terminated only when the pool is accessed.

The default value is 60.

It is telling like all the idle connection will be terminated only when pool is accessed, so in my case when i will again start my load testing then only my ram will be freed right and this is happening too. But @cjbj this is not good right, it is unnecessarily occupying my ram, it should be automatically removed by itself after the pool timeout.

Below is the enable stats values from oracledb.

{\"gatheredDate\":1646768503342,\"upTime\":1704109,\"upTimeSinceReset\":1678987,\"connectionRequests\":730624,\"requestsEnqueued\":231287,\"requestsDequeued\":231287,\"failedRequests\":0,\"rejectedRequests\":0,\"requestTimeouts\":0,\"maximumQueueLength\":501,\"currentQueueLength\":0,\"timeInQueue\":65323545,\"minimumTimeInQueue\":0,\"maximumTimeInQueue\":790,\"averageTimeInQueue\":282,\"connectionsInUse\":1,\"connectionsOpen\":1376,\"poolAlias\":\"FI\",\"queueMax\":1500,\"queueTimeout\":8000,\"poolMin\":100,\"poolMax\":1500,\"poolIncrement\":1,\"poolTimeout\":60,\"poolPingInterval\":60,\"poolMaxPerShard\":0,\"stmtCacheSize\":30,\"sodaMetaDataCache\":false,\"threadPoolSize\":\"1024\"}
amitagarwal-dev commented 2 years ago

i am using
oracledb@5.2.0
nodejs 14.19.1
instant client 19.6
oracle 19c
os-rhel 7.8

cjbj commented 2 years ago
amitagarwal-dev commented 2 years ago

@cjbj each of my user will do minimum 4 serial connection to the database and i have 2000 concurrent user, so in the stats you can see that i did 7.3 lac overall connection. some of them are queued and served later but the maximum open connection touched 1500 because i have given max process as 1500 in db. If i use 2500 concurrent user then i can see queue timeout error a lot.

I am expecting a shrink in ram at the db server machine after the load test (which is not happening).
So that the machine resources are usable for others.

DRCP looks good for me but right now i have only one db user means one pool from a single client. If i have multiple clients with multiple pools then i guess DRCP can come to rescue ( correct me if i am wrong).

My plan is to touch 5k concurrent users. But it requires around 5k session/process limit and currently with 1500 session/process my ram is 90% full but my cpu is peaking to 50% only. Also nodejs uv threadpool size is limited, so i will not get the performance as well.
Do you have any solution to this?

Just because a pool shrinks doesn't mean the OS will free memory from the Node.js process.   
On the DB tier the 'server' process for a connection will close so you should see some memory variation.

In my case, those processes are alive in db server machine because pool is not shrinking and it is consuming ram space which is limited. If i scale my app then ram will significantly degrade the other client performance ( just a thought not tested this one ).

cjbj commented 2 years ago

It's great you are putting a lot of load through node-oracledb.

You must keep poolMax < UV_THREADPOOL_SIZE <= 1024 per Node.js process otherwise you could end up with deadlocks because one connection is holding onto a needed database resource but that connection cannot get a Node.js thread to release the resource. If you need more connections, then you need more Node.js processes.

DRCP can help with any scenario where the DB host doesn't have enough memory - as long as the connection use satisfies the various DRCP requirements, such as being used for a short amount of time, etc etc. It is a shared resource, so if your database host can handle the number of connections required from a single app, then DRCP's need is lessened.

With such a large number of connections, you could really slam the DB host by creating and destroying DB server processes used by connections. Your benchmarking will give you an idea what your systems are capable of, and whether should rethink your strategy and instead consider fixed pool sizes. Adding a database listener connection rate limit may have a role.

amitagarwal-dev commented 2 years ago

@cjbj if i scale my nodejs server then i guess DRCP will be useful, i will try this.
Also I couldn't understand this sentence Adding a database listener connection rate limit may have a role ,
can you explain it little or throw some link related to it?

amitagarwal-dev commented 2 years ago

Also admin is not able to login to oracle because of these open processes which occupied all the sessions, there is no session left for a new connection from other client even if nobody is using the database.
Node-Oracledb should give a way to handle this, i don't know the internal of this but i can try a patch regarding this from you guys if you wanna try something.

cjbj commented 2 years ago

DRCP may only help if you have some inactive "users". If you have X number of users all trying to execute a query, then you need X server processes - or someone has to wait until a server process is available.

Also I couldn't understand this sentence Adding a database listener connection rate limit may have a role ,

The doc section Connections and High Availability has links for RATE_LIMIT and QUEUESIZE. These might help keep your database host responsive if your node-oracledb pools are growing (and shrinking and growing and...) Obviously they throttle new connection creation, so some users will have a delay. I still prefer a fixed pool size, with queuing handled in node-oracledb, or via an external load balancer.

Limiting load on the DB is something the DBA needs to handle. For example a resource profile with a limited SESSIONS_PER_USER could be configured. Or DRCP could be set up with a limited number of sessions in the pool.

The tips at the bottom of my recent blog post Always Use Connection Pools — and How may be a useful summary of node-oracledb doc.

Let us know how your testing goes.

amitagarwal-dev commented 2 years ago

Yesterday i tried with fixed pool size.
pool min: 100
pool max: 1000
pool timeout: 5 ( need some comment here ) queue timeout: 10
queue limit: 3000
uv_threadpool size: 1024

I was able to serve 3500 concurrent user with p(95) response time - 19sec, in which oracle was taking p(95) - 14sec. As i am using only 1000 connection at max from oracle because of limited resource, server response time was high.

I can see that node-oracledb queuing works fine with a average time in queue was 1.9sec at peak. As you told, there is no point of using pool max greater than uv_threadpool and it is true, so with single instance of my nodejs app i can serve limited concurrency and after that i have to scale my app and db resource for a better response time.

Thanks for the assistance, your blog entirely mirrored my scenario!

cjbj commented 2 years ago

@amitagarwal-dev was your pool min value a typo (or a different test)? Since the min & max values differ, that wasn't a fixed pool.

I'm glad the blog helped. Good luck with all the other tuning.

amitagarwal-dev commented 2 years ago

No its not a typo, i was trying with 100 only, just to keep the resources low after my test. I guess i misunderstood the fixed pool yesterday. There is a downside too to keep 1000/1000 pool max/min i.e resource consumption.
I doubled my ram in db machine, now my plan is to find the correct combination of oracledb pool parameter in order to get p95 db response time under 1 sec after that i will scale my nodejs app and i will try to get same response time from the app cluster.
This uv_threadpool is a major parameter here, which is limiting the performance of the single instance.

Let me study why this uv_threadpool size is limited?, because of resources it is consuming or there is a limitation in libuv itself.

@cjbj you can close this anytime if you are not interested in the result, anyways my initial problem is solved now.

cjbj commented 2 years ago

The threadpool size is limited by libuv:

https://github.com/libuv/libuv/blob/0a47e4c7714c9d1cf62efef3efafd3fc3ad354c2/src/threadpool.c#L30

https://github.com/libuv/libuv/blob/0a47e4c7714c9d1cf62efef3efafd3fc3ad354c2/src/threadpool.c#L199-L205

I'll close this. Let us know how you go: send me email if you like (see my profile)