oracle / node-oracledb

Oracle Database driver for Node.js maintained by Oracle Corp.
http://oracle.github.io/node-oracledb/
Other
2.26k stars 1.08k forks source link

CLOBs returned with extra data at the beginning #1576

Closed nathantheinventor closed 1 year ago

nathantheinventor commented 1 year ago
  1. What versions are you using?

Oracle Database in Amazon RDS versions 19.0.0.0.ru-2022-01.rur-2022-01.r1 and 19.0.0.0.ru-2022-10.rur-2022-10.r1 (multiple similar databases)

>  process.platform
'darwin'
>     process.version
'v19.9.0'
>     process.arch
'x64'
>     require('oracledb').versionString
'6.0.1'
  1. Is it an error or a hang or a crash?

Error - the wrong data is returned. The script also hangs occasionally but I haven't been able to trace down that error

  1. What error(s) or behavior you are seeing?

I have a database table with a CLOB column containing JSON data. I'm fetching thousands of records from that table (using oracledb.fetchAsString = [oracledb.CLOB];) and occasionally (roughly 1 in 7000 records) one of those rows comes back such that the JSON value has extra characters prepended. Those characters appear to represent the length of the field. So for example I have a row where the JSON column contains 2875 bytes. The library is instead giving me a 2877-byte string starting with bytes 0b 3b 7b 22 69 74 65 . . . (0x0b3b is equal to 2875).

This issue is new to node-oracledb v6 - it doesn't happen on v5.5.0

I've traced this through the library and I believe it's coming from packet.js. Specifically L157 starts a chunked read which defines a buffer for the results to go into. Inside that chunked read is a call to readUB4, which calls readBytes. I believe this error happens on occasions where the UB4 read spans between the current chunk and the next chunk, so readBytes starts a new chunked read on L225 and reads the length of the buffer. But it doesn't clean up after itself so the length of the buffer remains at the beginning of the chunked read buffer and gets returned.

anthony-tuininga commented 1 year ago

Yes, this is a bug. A similar bug was fixed in the Python driver some time ago so the solution is simple enough to implement: use a separate "split" buffer for these cases.

As for the "hangs", are you using the latest code on GitHub (what will become 6.0.2 when that release is made)? There was another issue that was already corrected that would definitely cause those issues. Do you have data that is duplicated from row to row?

nathantheinventor commented 1 year ago

No, I'm using v6.0.1 from npm. There are certainly some columns that contain duplicate data for many rows but no two rows would be entirely identical (at the very least I'm fetching an ID column). The hangs have happened three times in the last 24 hours (it runs on an hourly schedule) but only on one database (we have about ten instances for different customers that are all running the same code - the one that hangs is by far the biggest). If 6.0.2 fixes the issue I'm certainly willing to wait for that to be released - the occasional hangs aren't causing major problems.

anthony-tuininga commented 1 year ago

To be clear, it is not that the entire row needs to be duplicated, merely that some columns are duplicated from one row to the next. So that could easily explain the source of the hangs. You can copy the JavaScript files from GitHub to your installed copy (specifically lib/thin/protocol/messages/withData.js) and verify that does resolve the hangs. If I can verify my fix I'll post it here as well and you can try that as well to ensure that it works for you, too!

anthony-tuininga commented 1 year ago

I can replicate this issue with the following setup:

create table issue_1576 (
    id number(9) not null,
    content clob not null
);

insert into issue_1576 values (1, rpad('A', 8068, 'A'));
insert into issue_1576 values (2, rpad('B', 8068, 'B'));

commit;

and the following script:

const oracledb = require('oracledb');

oracledb.fetchAsString = [oracledb.DB_TYPE_CLOB];

const config = {
  user: "user",
  password: "password",
  connectString: "host/service_name"
};

async function run() {
  const conn = await oracledb.getConnection(config);
  const result = await conn.execute("select * from issue_1576");
  console.log("result:", result);
  await conn.close();
}

run();

This will show that the returned value on the second row is corrupted. I hope to have a patch shortly for this.

anthony-tuininga commented 1 year ago

And here is the patch:

diff --git a/lib/thin/protocol/packet.js b/lib/thin/protocol/packet.js
index d12ef304..712f1b33 100644
--- a/lib/thin/protocol/packet.js
+++ b/lib/thin/protocol/packet.js
@@ -215,12 +215,17 @@ class ReadPacket extends BaseBuffer {
     }

     // the requested bytes are split across multiple packets; if a chunked read
-    // is not in progress one is implicitly started; copy the bytes from the
-    // end of this packet
-    if (!inChunkedRead) {
-      this.chunkedBytesBuf.startChunkedRead();
+    // is in progress, a chunk is acquired that will accommodate the requested
+    // bytes; otherwise, a separate buffer will be allocated to accommodate the
+    // requested bytes
+    let buf;
+    if (inChunkedRead) {
+      buf = this.chunkedBytesBuf.getBuf(numBytes);
+    } else {
+      buf = Buffer.alloc(numBytes);
     }
-    const buf = this.chunkedBytesBuf.getBuf(numBytes);
+
+    // copy the bytes to the buffer from the remainder of this packet
     let offset = 0;
     this.buf.copy(buf, offset, this.pos, this.pos + numBytesLeft);
     offset += numBytesLeft;
@@ -236,10 +241,7 @@ class ReadPacket extends BaseBuffer {
       numBytes -= numSplitBytes;
     }

-    // if not in a chunked bytes read, return the buffer directly
-    if (!inChunkedRead) {
-      return this.chunkedBytesBuf.endChunkedRead();
-    }
+    return buf;

   }

This patch corrects the issue. Thanks for reporting it!

sharadraju commented 1 year ago

@nathantheinventor The patch has been checked into GitHub and is explained in the previous comment by @anthony-tuininga. Please replace the /lib/thin/protocol/packet.js file with the current file in GitHub and re-run the code. The fix will be incorporated in the 6.0.2 release.

sharadraju commented 1 year ago

This is now available as part of node-oracledb 6.0.2. See the release announcement