oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.24k stars 1.07k forks source link

Segmentation Fault when Executing PLSQL at Production Load. #1517

Closed dannyb648 closed 1 year ago

dannyb648 commented 1 year ago

1. What versions are you using?

v5.4.0. (We were also seeing this in 4.3.0).

Give your database version.

Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

Also run Node.js and show the output of:

process.platform

'linux'

process.version

'v14.18.2'

process.arch

'x64'

require('oracledb').versionString

'4.2.0'

require('oracledb').oracleClientVersionString

'19.16.0.0.0'

2. Is it an error or a hang or a crash?

Its a complete crash causing a Segmentation Fault. We've installed a module which catches the fault and lets us log before the signal is raised to the OS. This issue only started occurring when we moved from Node 12 to Node 14.

We've seen this both at heavy load in Production but also caught it happening in Development when the database wasn't under massive load.

3. What error(s) or behavior you are seeing?

The error has been happening in only a small number of APIs in Production (and once in Dev). We haven't been able to replicate it on demand and we are still investigating. It looks to be at the point when data (a CLOB) is loaded out of the module into the node process.


*PID 171 received SIGSEGV for address: 0x4
s/home/node/policies-api/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2f56)[0x7f6bf0188f56]
1/lib64/libpthread.so.0(+0x12ce0)[0x7f6bf140ace0]
L/usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x1522135)[0x7f6bbd407135]
P/usr/lib/oracle/19.16/client64/lib/libclntsh.so(kpultfr+0x1c90)[0x7f6bbc35fe40]
K/usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x84aa86)[0x7f6bbc72fa86]
N/usr/lib/oracle/19.16/client64/lib/libclntsh.so(kolttfr+0x36)[0x7f6bbc72f816]
O/usr/lib/oracle/19.16/client64/lib/libclntsh.so(kollfred+0x3e)[0x7f6bbc73003e]
L/usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x296e0fb)[0x7f6bbe8530fb]
O/usr/lib/oracle/19.16/client64/lib/libclntsh.so(kopetrv+0x112)[0x7f6bbc76b012]
K/usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x821363)[0x7f6bbc706363]
N/usr/lib/oracle/19.16/client64/lib/libclntsh.so(koiodes+0xe5)[0x7f6bbc706fe5]
K/usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x996abf)[0x7f6bbc87babf]
N/usr/lib/oracle/19.16/client64/lib/libclntsh.so(kohfri+0x122)[0x7f6bbc87bd72]
N/usr/lib/oracle/19.16/client64/lib/libclntsh.so(koiofre+0x27)[0x7f6bbc708e77]
U/usr/lib/oracle/19.16/client64/lib/libclntsh.so(OCIObjectFree+0x1a8)[0x7f6bbc715ee8]
�/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiOci__objectFree+0x38)[0x7f6bf03d76d8]
�/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiObject__closeHelper+0x2d)[0x7f6bf03cf4cd]
�/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiObject__close+0x8e)[0x7f6bf03cf5ee]
�/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiObject__free+0x17)[0x7f6bf03cf657]
�/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiGen__release+0x2e)[0x7f6bf03c999e]
t/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(+0x2208a)[0x7f6bf03ad08a]
 node(+0x799043)[0x5622ec427043]
Qnode(_ZN4node11Environment27RunAndClearNativeImmediatesEb+0x13d)[0x5622ec3e4b9d]
Onode(_ZN4node11Environment14CheckImmediateEP10uv_check_s+0x95)[0x5622ec3e5305]
!node(+0x1151821)[0x5622ecddf821]
#node(uv_run+0x128)[0x5622ecdd7ee8]
=node(_ZN4node16NodeMainInstance3RunEv+0x202)[0x5622ec4ad692]
0node(_ZN4node5StartEiPPc+0x17c)[0x5622ec42453c]
9/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f6bf106dcf3]
"node(_start+0x2e)[0x5622ec3a4cae]

Also we have the same error from version 4.2.0 of the module

sh-4.4$ cat /home/node/policies-api/crashlogs/policies-api-crash-1662994385989.log
PID 160 received SIGSEGV for address: 0x4
/home/node/policies-api/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2f56)[0x7f90fc2aaf56]
/lib64/libpthread.so.0(+0x12c20)[0x7f9100d22c20]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x15148c5)[0x7f90cd4e78c5]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpultfr+0x1c90)[0x7f90cc440e80]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x82b5f8)[0x7f90cc7fe5f8]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kolttfr+0x36)[0x7f90cc7fe396]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kollfred+0x3e)[0x7f90cc7febae]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x28e74eb)[0x7f90ce8ba4eb]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kopetrv+0x115)[0x7f90cc811be5]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x8029c3)[0x7f90cc7d59c3]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(koiodes+0xe5)[0x7f90cc7d6645]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x96ff5f)[0x7f90cc942f5f]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kohfri+0x122)[0x7f90cc943212]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(koiofre+0x27)[0x7f90cc7d84d7]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(OCIObjectFree+0x1a8)[0x7f90cc7e5518]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiOci__objectFree+0x38)[0x7f90fdcf15c8]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__closeHelper+0x2d)[0x7f90fdce8ecd]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__close+0x8e)[0x7f90fdce8fee]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__free+0x17)[0x7f90fdce9057]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiGen__release+0x2e)[0x7f90fdce574e]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(+0x1e4fa)[0x7f90fdcce4fa]
node(+0x799043)[0x55955cfc2043]
node(_ZN4node11Environment27RunAndClearNativeImmediatesEb+0x13d)[0x55955cf7fb9d]
node(_ZN4node11Environment14CheckImmediateEP10uv_check_s+0x95)[0x55955cf80305]
node(+0x1151821)[0x55955d97a821]
node(uv_run+0x128)[0x55955d972ee8]
node(_ZN4node16NodeMainInstance3RunEv+0x202)[0x55955d048692]
node(_ZN4node5StartEiPPc+0x17c)[0x55955cfbf53c]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f910096e493]
node(_start+0x2e)[0x55955cf3fcae]

4. Include a runnable Node.js script that shows the problem.

We are currently working on making the problem re-creatable and then will edit this to add a reproducible schema / node example. With this being a client database / api this will take some time.

Include all SQL needed to create the database schema.

Since this is a client Database, we're unable to reproduce exactly - but this issue is happening when we are calling a PLSQL proc which returns a LOB.

dannyb648 commented 1 year ago

After some investigation, it looks like this issue is being caused on Cleardown after we have accessed the Database and returned a Clob or Database Type. We arnt sure if the issue is when we close down the connection or if its during a routine garbage collection however. We're questioning if this is something to do with dpiGen_release() but really we are outside our area of expertise. It now looks like this issue is far more wide spread and is happening to most APIs across our estate - some being hit harder than others, but we've seen up to 250 segfault crashes in one api since the bug was reported in Production.

How is best to provide the example code? We are looking at a Docker Image running a node process with an accompanying PL/SQL package to be created. Our plan is to run multiple copies of the example, in a like for like environment and wait to see if the 'example' container crashes in the same way as production. As mentioned before we haven't worked out a specific trigger yet.

cjbj commented 1 year ago

@dannyb648 knowing whether it is a LOB or an object would be a big help. In case the issue is lower in the stack, can you enable OCI tracing by either setting the environment variable EVENT10842=15 (if you know where logs go), or by creating a sqlnet.ora file with something like this which will create logs in `/tmp/oradiag/diag/clients/user_/host*/trace`?:

ADR_BASE=/tmp
DIAG_ADR_ENABLED=ON
EVENT_10842=15

@pvenkatraman please take a look.

pvenkatraman commented 1 year ago

@dannyb648 in both the stack traces I see /usr/lib/oracle/19.5/client64/lib/libclntsh.so(OCIObjectFree+0x1a8)[0x7f90cc7e5518] This means you are trying to free a DB Object type. Lob APIs are not in the stack. Yes, an example would be very helpful.

GBABEN2 commented 1 year ago

Sample application is located here: https://github.com/GBABEN2/SegfaultPoc The readme has instructions on how to get this running with docker

Dan is currently on holiday so sorry this has taken a while to get back. We are still seeing this issue and still struggling to find out how to replicate this at runtime when users are actively using the system. However, when trying to enable the OCI tracing this has given us the error when packaged up into a container, and doesn't give us the segfault issue when running on a local machine (Windows rather than linux). It is throwing a segfault when trying to create a connection pool.

It is only throwing this segfault when EVENT_10842 is set to 15, if we set EVENT_10842=11 or below this does not cause an error when trying to create the connection pool. Going off this page https://us.v-cdn.net/6032256/uploads/jive_attachments/3/5/9/359154ydoByranib.html it's enabling the tracing for statistic calls. During our investigations we've been unable to correlate this segfault issue in production with high load, and one of the times it threw the error was when no traffic was being routed to the container. We do periodically do a metrics check on the container to see the health of the connections etc, so this will be querying the statistics which seems to tie in with the EVENT_10842=15 log level being around stats info.

The top of the stack trace thrown from this ends up creating the error at the same place as we are seeing in production, but the path to getting into this lib is different. We are hoping that this connection pool issue is the same issue we are seeing in production, just a different entrypoint into the same issue. Both traces reference libpthread.so.0(+0x12c20)

Original production issue

/home/node/policies-api/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2f56)[0x7f90fc2aaf56]
/lib64/libpthread.so.0(+0x12c20)[0x7f9100d22c20]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x15148c5)[0x7f90cd4e78c5]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpultfr+0x1c90)[0x7f90cc440e80]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x82b5f8)[0x7f90cc7fe5f8]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kolttfr+0x36)[0x7f90cc7fe396]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kollfred+0x3e)[0x7f90cc7febae]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x28e74eb)[0x7f90ce8ba4eb]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kopetrv+0x115)[0x7f90cc811be5]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x8029c3)[0x7f90cc7d59c3]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(koiodes+0xe5)[0x7f90cc7d6645]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x96ff5f)[0x7f90cc942f5f]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kohfri+0x122)[0x7f90cc943212]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(koiofre+0x27)[0x7f90cc7d84d7]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(OCIObjectFree+0x1a8)[0x7f90cc7e5518]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiOci__objectFree+0x38)[0x7f90fdcf15c8]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__closeHelper+0x2d)[0x7f90fdce8ecd]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__close+0x8e)[0x7f90fdce8fee]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__free+0x17)[0x7f90fdce9057]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiGen__release+0x2e)[0x7f90fdce574e]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(+0x1e4fa)[0x7f90fdcce4fa]

Error on supplied repo:

 { poolAlias: 'segfault_prototype' } Creating connection pool
my-service_1  | PID 15 received SIGSEGV for address: 0x0
my-service_1  | /home/node/app/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2f56)[0x7fd50c046f56]
my-service_1  | /lib64/libpthread.so.0(+0x12ce0)[0x7fd5116e7ce0]
my-service_1  | /lib64/libc.so.6(+0xcce67)[0x7fd5113dce67]
my-service_1  | /lib64/libc.so.6(_IO_vfprintf+0x7d0)[0x7fd511377300]
my-service_1  | /lib64/libc.so.6(vsnprintf+0x94)[0x7fd5113a02d4]
my-service_1  | /lib64/libc.so.6(__snprintf+0x93)[0x7fd51137fb13]
my-service_1  | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x141a01a)[0x7fd2752ff01a]
my-service_1  | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(kpuEntryCallbackTLS+0x1b4)[0x7fd2753c5994]
my-service_1  | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(kpuspcreate+0x3fb)[0x7fd27546430b]
my-service_1  | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(OCISessionPoolCreate+0x145)[0x7fd2751a05d5]
my-service_1  | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiOci__sessionPoolCreate+0x89)[0x7fd50c296b39]
my-service_1  | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiPool_create+0x5c1)[0x7fd50c29ba41]
my-service_1  | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(+0x273eb)[0x7fd50c2703eb]
my-service_1  | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(+0x19c25)[0x7fd50c262c25]
pvenkatraman commented 1 year ago

With EVENT_10842 set to 15, OCICreateSessionPool () crashes as you have mentioned when invalid parameters are provided. This issue is fixed and is available in 19.16.0.0.0, can you please download latest instant client and try. Thanks.

GBABEN2 commented 1 year ago

Hi @pvenkatraman

The sample application above is already using version 19.16.0.0.0. This is referenced in the dockerfile when doing the build of the image.

# Add oracle instantclient repository and install instantclient
RUN curl -sS -o /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle https://yum.oracle.com/RPM-GPG-KEY-oracle-ol8
RUN echo -e "[ol8_oracle_instantclient]\n\
name=Oracle Instant Client for Oracle Linux \$releasever (\$basearch)\n\
baseurl=https://yum.oracle.com/repo/OracleLinux/OL8/oracle/instantclient/\$basearch/\n\
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle\n\
gpgcheck=1\n\
enabled=0" > /etc/yum.repos.d/public-yum-ol8.repo
RUN microdnf install --nodocs --enablerepo=ol8_oracle_instantclient \
    oracle-instantclient19.16-basic
pvenkatraman commented 1 year ago

Can you provide the parameters passed in OCISessionPoolCreate (). Thx

GBABEN2 commented 1 year ago

Sure, this is what's getting passed into the createPool(), connection details redacted.

oracledb.createPool({
    connectionString: '(DESCRIPTION=(EXPIRE_TIME=10)(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.14.109)(PORT=1527))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=Service.company.com)))',
    password: '******',
    user: '******',
    poolAlias: 'segfault_prototype',
    poolMax: 5,
    poolMin: 3,
    poolIncrement: 1,
    enableStatistics: true,
    queueTimeout: 60000,
    stmtCacheSize: 0,
});
pvenkatraman commented 1 year ago

Hi, there is a patch available on top of 19.16.0.0.0 (Bug #31597668), can you download and apply this patch. This should fix the OCI tracing segfault issue, (EVENT_10842=15). Thanks,

pvenkatraman commented 1 year ago

Can you please download and try with 19.17.0.0.0 - I tried myself and OCI tracing is working good. Will be useful. Thanks.

cjbj commented 1 year ago

Closing - no activity.