oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.25k stars 1.07k forks source link

OracleDb module is throwing segfault in nodejs app #1457

Closed amitagarwal-dev closed 2 years ago

amitagarwal-dev commented 2 years ago

I am getting segfault in my nodejs app while doing load testing. I managed to get stack trace before segfault and it looks like segfault is thrown by oracledb module. I am also getting below error 2-3 times in 50k request.

Error: DPI-1039: statement was already closed.

This error i got first time in my app. Please suggest!

stack trace:
PID 28233 received SIGSEGV for address: 0x0 /home/Amith/fistack/loadtesting1/FI/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2cc1)[0x7f73f80c9cc1] /lib64/libpthread.so.0(+0xf630)[0x7f73f9a2d630] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpuqrs+0x48)[0x7f7397112e38] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpurclr+0x147)[0x7f7397118457] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpcxc2r+0x86)[0x7f7395b1d9d6] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(ttcrs2c+0x25c)[0x7f7397154e7c] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(ttcacr+0x3a7)[0x7f739464c957] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(ttcdrv+0x1b0b)[0x7f739716779b] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(nioqwa+0x3d)[0x7f739713ae6d] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(upirtrc+0x5e8)[0x7f73971148a8] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpurcsc+0x64)[0x7f739712bc54] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpuexec+0x29fe)[0x7f739711c4de] /usr/lib/oracle/19.5/client64/lib/libclntsh.so(OCIStmtExecute+0x29)[0x7f7397113dc9] /home/Amith/fistack/loadtesting1/FI/node_modules/oracledb/build/Release/oracledb-5.2.0-linux-x64.node(dpiOci__stmtExecute+0x4e)[0x7f73f05c4fee] /home/Amith/fistack/loadtesting1/FI/node_modules/oracledb/build/Release/oracledb-5.2.0-linux-x64.node(+0x56513)[0x7f73f05ce513] /home/Amith/fistack/loadtesting1/FI/node_modules/oracledb/build/Release/oracledb-5.2.0-linux-x64.node(dpiStmt_execute+0x94)[0x7f73f05d11c4] /home/Amith/fistack/loadtesting1/FI/node_modules/oracledb/build/Release/oracledb-5.2.0-linux-x64.node(+0x1d73c)[0x7f73f059573c] /home/Amith/fistack/loadtesting1/FI/node_modules/oracledb/build/Release/oracledb-5.2.0-linux-x64.node(+0x18db5)[0x7f73f0590db5] node /home/Amith/fistack/loadtesting1/FI/hulk.js[0x139c354] /lib64/libpthread.so.0(+0x7ea5)[0x7f73f9a25ea5] /lib64/libc.so.6(clone+0x6d)[0x7f73f974e96d]

Versions:

database version: 19c platform: rhel 7.9 nodejs version : v14.17.2 arch : x64 oracledb version: 5.2.0 oracledb client version: 19.5.0

Pool Config:

poolAttrs={
                        user     : database.USER,
                        password : database.PASSWORD,
                        connectString : database.HOST+':'+database.PORT+'/'+database.DATABASE,
                        poolMax : 800, //The maximum number of connections to which a connection pool can grow.
                        queueRequests : true,//New requests for connections are queued until in-use connections are released.
                        poolTimeout : 60,//The number of seconds after which idle connections (unused in the pool) are terminated.
                        queueTimeout : 3000,//Number of milliseconds after which connection requests waiting in the connection request queue are terminated.
                        enableStatistics : true,
                        poolAlias: 'FI'
};
cjbj commented 2 years ago

Error: DPI-1039: statement was already closed. would indicate a coding error - an operation may be missing an await, or a callback is missing, or a connection is being closed too early, or the like. Check use of queryStream events vs our doc and examples.

You could do worse than try oracledb.errorOnConcurrentExecute, though that may (i) not be useful if your coding style requires parallel ops on a connection (ii) may not help

amitagarwal-dev commented 2 years ago

@cjbj i don't think i am missing any await, I am getting this error while calling a stored procedure and before calling i am getting a connection from the pool and after calling i am closing this connection and this works fine from last 4 years till i upgraded from v3.0.0 to v5.2.0 .
Below is the code snippet for calling sp. My main problem is this segfault generated by oracledb. Do you have any idea about this?

TransXpedia.connection.execute('BEGIN TXN_GETPRIMARYAGENTINFO(:DeviceID, :cursor); END;',{   
        DeviceID : TransXpedia.CLIENT_JSON.DEVICEID,                                         
        cursor : {type : oracledb.CURSOR, dir : oracledb.BIND_OUT}                           
}, (e, results)=> {
// e is the actual error                                                                          
amitagarwal-dev commented 2 years ago

@cjbj if this is my coding bug then this error should come more frequently but it is coming 2-3 over 50-60k request.

cjbj commented 2 years ago

if this is my coding bug then this error should come more frequently but it is coming 2-3 over 50-60k request.

That logic could be applied to any layer of code! I know the problem could be the client libraries or DB, but the common cause of similar errors is in incorrect parallel code in the application layer. And the most direct evidence the problem is in your implementation is the error Error: DPI-1039: statement was already closed.

Although you see the error calling PL/SQL, the root cause may occur elsewhere. The stack trace doesn't pin down the cause. How are you processing the REF CURSOR? Did you try oracledb.errorOnConcurrentExecute? Are you using queryStream() or LOB streaming?

amitagarwal-dev commented 2 years ago

@cjbj i am using queryStream()
Below function is fetching data from cursor in my code.
But the error is not coming from this function.
Error is coming while calling procedure, so context never reached to this function. let me try oracledb.errorOnConcurrentExecute.

function fetchRCFromStream(cursor, TransXpedia) {
        return new Promise((resolved,rejected)=>{
                TransXpedia.results=[];
                var stream = cursor.toQueryStream();

                stream.on('error', function (e) {
                        TransXpedia.systemErr = e;
                        TransXpedia.serverError    = errModule.E_ORACLE_STREAM;
                        TransXpedia.errDescription = `STREAM ERROR` +
                                `[FUNCTION  : ${fetchRCFromStream.name}] `;
                        return rejected(TransXpedia);

                });

                stream.on('metadata', function (metadata) {
                        // access metadata of query
                        //       
                });

                stream.on('data', function (data) {
                        TransXpedia.results.push(data);

                });

                stream.on('end', function () {
                        return resolved(TransXpedia);
                });

        });

}
cjbj commented 2 years ago

Compare that with the doc & examples. If your resolved() functionality closes the connection then you will have a problem. Also you will want to call destroy() to release resources.

amitagarwal-dev commented 2 years ago

@cjbj i tried this oracledb.errorOnConcurrentExecute and this time i didn't get the Error: DPI-1039: statement was already closed. error but still my server is crashed because of segfault thrown by oracledb module.

Do you have any idea how to proceed with this?

cjbj commented 2 years ago

I'm sorry to hear that.

Did you fix your query streaming, e.g. to resolve on close, not end? See https://github.com/oracle/node-oracledb/blob/v5.3.0/examples/refcursortoquerystream.js#L78-L87

If your actual code is as simple as the snippet you pasted, and since you appear to be fetching all rows at once, you may prefer just to call getRows() instead of manually building up the return array, see https://github.com/oracle/node-oracledb/blob/v5.3.0/examples/refcursor.js#L110-L115

amitagarwal-dev commented 2 years ago

@cjbj let me try this and come back with the result.

amitagarwal-dev commented 2 years ago

@cjbj I added querystream.destroy() on the end event and resolved from the close event. After that i initiated 127k request and i didn't faced any segfault or Error: DPI-1039: statement was already closed error.

Let me initiate some more test then i will close this issue. Thanks for the clue.

cjbj commented 2 years ago

@amitagarwal-dev more importantly: did you move resolve to a close event (in case something after the resolve causes the connection to be released)?

amitagarwal-dev commented 2 years ago

@cjbj yes i moved resolve to a close event.
Just after this resolve i am releasing the connection taken from the pool.

@cjbj i have one doubt, whether this querystream will not close automatically once we release the pool connection?
is it a different connection?

cjbj commented 2 years ago

@cjbj yes i moved resolve to a close event.

Ahh yes, I missed you saying that. That change is the critical one.

Although resources can be cleaned up, use stream.destroy() so you are not dependent on the garbage collector.

Did you also try removing toQueryStream() and instead just using a single call something like TransXpedia.results = cursor.getRows()?

amitagarwal-dev commented 2 years ago

@cjbj i am using single function to fetch data from cursor and my cursor can return a single row as well as huge multiple rows depends on procedure call.
Also there is one major problem i can see in my code, even though i am using querystream, i am keeping all the rows in nodejs memory for a while, i am not processing the streamed chunks of batches (not sure, i need to study about this), so it actually acts like a getRows() for me , correct me if i am wrong.

Code after changes

function fetchRCFromStream(cursor, TransXpedia) {
        return new Promise((resolved,rejected)=>{
                TransXpedia.results=[];
                var stream = cursor.toQueryStream();

                stream.on('error', function (e) {
                        TransXpedia.systemErr = e;
                        TransXpedia.serverError    = errModule.E_ORACLE_STREAM;
                        TransXpedia.errDescription = `STREAM ERROR` +
                                `[FUNCTION  : ${fetchRCFromStream.name}] `;
                        return rejected(TransXpedia);

                });

                stream.on('metadata', function (metadata) {
                        // access metadata of query
                        //       
                });

                stream.on('data', function (data) {
                        TransXpedia.results.push(data);

                });

                stream.on('end', function () {
                        stream.destroy();
                });
                stream.on('close', function () {
                        return resolved(TransXpedia);
                });

        });

}
cjbj commented 2 years ago

@amitagarwal-dev yes you are keeping all rows in memory. Query streaming or getRows(n) can be used to avoid this, assuming the rest of the app knows how to process batches of data.

Since your problem is resolved, I'll close this issue. Thanks for using node-oracledb.