Closed dannyb648 closed 1 year ago
After some investigation, it looks like this issue is being caused on Cleardown after we have accessed the Database and returned a Clob or Database Type. We arnt sure if the issue is when we close down the connection or if its during a routine garbage collection however. We're questioning if this is something to do with dpiGen_release() but really we are outside our area of expertise. It now looks like this issue is far more wide spread and is happening to most APIs across our estate - some being hit harder than others, but we've seen up to 250 segfault crashes in one api since the bug was reported in Production.
How is best to provide the example code? We are looking at a Docker Image running a node process with an accompanying PL/SQL package to be created. Our plan is to run multiple copies of the example, in a like for like environment and wait to see if the 'example' container crashes in the same way as production. As mentioned before we haven't worked out a specific trigger yet.
@dannyb648 knowing whether it is a LOB or an object would be a big help. In case the issue is lower in the stack, can you enable OCI tracing by either setting the environment variable EVENT10842=15 (if you know where logs go), or by creating a sqlnet.ora file with something like this which will create logs in `/tmp/oradiag/diag/clients/user_/host*/trace`?:
ADR_BASE=/tmp
DIAG_ADR_ENABLED=ON
EVENT_10842=15
@pvenkatraman please take a look.
@dannyb648 in both the stack traces I see /usr/lib/oracle/19.5/client64/lib/libclntsh.so(OCIObjectFree+0x1a8)[0x7f90cc7e5518] This means you are trying to free a DB Object type. Lob APIs are not in the stack. Yes, an example would be very helpful.
Sample application is located here: https://github.com/GBABEN2/SegfaultPoc The readme has instructions on how to get this running with docker
Dan is currently on holiday so sorry this has taken a while to get back. We are still seeing this issue and still struggling to find out how to replicate this at runtime when users are actively using the system. However, when trying to enable the OCI tracing this has given us the error when packaged up into a container, and doesn't give us the segfault issue when running on a local machine (Windows rather than linux). It is throwing a segfault when trying to create a connection pool.
It is only throwing this segfault when EVENT_10842 is set to 15, if we set EVENT_10842=11 or below this does not cause an error when trying to create the connection pool. Going off this page https://us.v-cdn.net/6032256/uploads/jive_attachments/3/5/9/359154ydoByranib.html it's enabling the tracing for statistic calls. During our investigations we've been unable to correlate this segfault issue in production with high load, and one of the times it threw the error was when no traffic was being routed to the container. We do periodically do a metrics check on the container to see the health of the connections etc, so this will be querying the statistics which seems to tie in with the EVENT_10842=15 log level being around stats info.
The top of the stack trace thrown from this ends up creating the error at the same place as we are seeing in production, but the path to getting into this lib is different. We are hoping that this connection pool issue is the same issue we are seeing in production, just a different entrypoint into the same issue. Both traces reference libpthread.so.0(+0x12c20)
Original production issue
/home/node/policies-api/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2f56)[0x7f90fc2aaf56]
/lib64/libpthread.so.0(+0x12c20)[0x7f9100d22c20]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x15148c5)[0x7f90cd4e78c5]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kpultfr+0x1c90)[0x7f90cc440e80]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x82b5f8)[0x7f90cc7fe5f8]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kolttfr+0x36)[0x7f90cc7fe396]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kollfred+0x3e)[0x7f90cc7febae]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x28e74eb)[0x7f90ce8ba4eb]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kopetrv+0x115)[0x7f90cc811be5]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x8029c3)[0x7f90cc7d59c3]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(koiodes+0xe5)[0x7f90cc7d6645]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(+0x96ff5f)[0x7f90cc942f5f]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(kohfri+0x122)[0x7f90cc943212]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(koiofre+0x27)[0x7f90cc7d84d7]
/usr/lib/oracle/19.5/client64/lib/libclntsh.so(OCIObjectFree+0x1a8)[0x7f90cc7e5518]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiOci__objectFree+0x38)[0x7f90fdcf15c8]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__closeHelper+0x2d)[0x7f90fdce8ecd]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__close+0x8e)[0x7f90fdce8fee]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiObject__free+0x17)[0x7f90fdce9057]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(dpiGen__release+0x2e)[0x7f90fdce574e]
/home/node/policies-api/node_modules/oracledb/build/Release/oracledb-4.2.0-linux-x64.node(+0x1e4fa)[0x7f90fdcce4fa]
Error on supplied repo:
{ poolAlias: 'segfault_prototype' } Creating connection pool
my-service_1 | PID 15 received SIGSEGV for address: 0x0
my-service_1 | /home/node/app/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2f56)[0x7fd50c046f56]
my-service_1 | /lib64/libpthread.so.0(+0x12ce0)[0x7fd5116e7ce0]
my-service_1 | /lib64/libc.so.6(+0xcce67)[0x7fd5113dce67]
my-service_1 | /lib64/libc.so.6(_IO_vfprintf+0x7d0)[0x7fd511377300]
my-service_1 | /lib64/libc.so.6(vsnprintf+0x94)[0x7fd5113a02d4]
my-service_1 | /lib64/libc.so.6(__snprintf+0x93)[0x7fd51137fb13]
my-service_1 | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(+0x141a01a)[0x7fd2752ff01a]
my-service_1 | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(kpuEntryCallbackTLS+0x1b4)[0x7fd2753c5994]
my-service_1 | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(kpuspcreate+0x3fb)[0x7fd27546430b]
my-service_1 | /usr/lib/oracle/19.16/client64/lib/libclntsh.so(OCISessionPoolCreate+0x145)[0x7fd2751a05d5]
my-service_1 | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiOci__sessionPoolCreate+0x89)[0x7fd50c296b39]
my-service_1 | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(dpiPool_create+0x5c1)[0x7fd50c29ba41]
my-service_1 | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(+0x273eb)[0x7fd50c2703eb]
my-service_1 | /home/node/app/node_modules/oracledb/build/Release/oracledb-5.4.0-linux-x64.node(+0x19c25)[0x7fd50c262c25]
With EVENT_10842 set to 15, OCICreateSessionPool () crashes as you have mentioned when invalid parameters are provided. This issue is fixed and is available in 19.16.0.0.0, can you please download latest instant client and try. Thanks.
Hi @pvenkatraman
The sample application above is already using version 19.16.0.0.0. This is referenced in the dockerfile when doing the build of the image.
# Add oracle instantclient repository and install instantclient
RUN curl -sS -o /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle https://yum.oracle.com/RPM-GPG-KEY-oracle-ol8
RUN echo -e "[ol8_oracle_instantclient]\n\
name=Oracle Instant Client for Oracle Linux \$releasever (\$basearch)\n\
baseurl=https://yum.oracle.com/repo/OracleLinux/OL8/oracle/instantclient/\$basearch/\n\
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-oracle\n\
gpgcheck=1\n\
enabled=0" > /etc/yum.repos.d/public-yum-ol8.repo
RUN microdnf install --nodocs --enablerepo=ol8_oracle_instantclient \
oracle-instantclient19.16-basic
Can you provide the parameters passed in OCISessionPoolCreate (). Thx
Sure, this is what's getting passed into the createPool(), connection details redacted.
oracledb.createPool({
connectionString: '(DESCRIPTION=(EXPIRE_TIME=10)(ADDRESS=(PROTOCOL=TCP)(HOST=10.20.14.109)(PORT=1527))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=Service.company.com)))',
password: '******',
user: '******',
poolAlias: 'segfault_prototype',
poolMax: 5,
poolMin: 3,
poolIncrement: 1,
enableStatistics: true,
queueTimeout: 60000,
stmtCacheSize: 0,
});
Hi, there is a patch available on top of 19.16.0.0.0 (Bug #31597668), can you download and apply this patch. This should fix the OCI tracing segfault issue, (EVENT_10842=15). Thanks,
Can you please download and try with 19.17.0.0.0 - I tried myself and OCI tracing is working good. Will be useful. Thanks.
Closing - no activity.
1. What versions are you using?
Give your database version.
Also run Node.js and show the output of:
process.platform
process.version
process.arch
require('oracledb').versionString
require('oracledb').oracleClientVersionString
2. Is it an error or a hang or a crash?
3. What error(s) or behavior you are seeing?
Also we have the same error from version 4.2.0 of the module
4. Include a runnable Node.js script that shows the problem.