ibmdb / node-ibm_db

IBM DB2 and IBM Informix bindings for node
MIT License
188 stars 151 forks source link

Fix for https://github.com/ibmdb/node-ibm_db/issues/942 #944

Closed asselin closed 9 months ago

asselin commented 10 months ago

OUTPUT parameter of type BINARY and Buffer causes node to crash with a double free of memory

bimalkjha commented 10 months ago

@asselin Thanks for identifying the root cause of the issue and fixing it. After looking on the details, I think we should address it in ODBC::GetOutputParameter function itself as below:

      if(sqlTypeBinary || prm.c_type == SQL_C_BINARY) {
          //str = Nan::NewOneByteString((uint8_t *) prm.buffer, prm.length).ToLocalChecked();
          //prm.buffer will be freed by Garbage collector, no need to free here.
          if(prm->isBuffer) {
            return scope.Escape((prm.buffer).ToLocalChecked());
          } else {
            prm->isBuffer = true;
            return scope.Escape(Nan::NewBuffer((char *)prm.buffer, prm.length).ToLocalChecked());
          }
      }

i.e. if buffer is passed by application, no need to reallocate in ibm_db, rather use the same buffer. This change should address your another PR too. Thanks.

asselin commented 10 months ago

@bimalkjha For the buffer case, I have 1 concern-- ibmdb doesn't add a reference or take ownership of the buffer that is passed in in GetBufferParam(), so wouldn't it be possible for the passed in buffer to be GC'd during the time it takes to send the request to DB2 and get a response?

I admit that I don't have deep knowledge of node/v8 internals, so maybe that's not possible, but isn't that why e.g. memory is allocated for String parameters and they're copied into the new memory?

Thank you!

bimalkjha commented 10 months ago

@asselin the GC frees memory only if it is no longer accessible and goes out of scope. When buffer is passed by the application to ibm_db, it is still under scope and in use during the processing of ibm_db API. So, GC do not affect such buffers. String parameter may be static memory, so we need to copy it in a dynamic memory, but Buffers are already dynamic memory so no need to allocate another dynamic memory but use the passed one. Hope it clarifies. Thanks.

asselin commented 9 months ago

@bimalkjha Suppose the calling code looked something like this:

import { Database } from 'ibm_db';

async function getHashFromDb(db: Database): Promise<any[]> {
  const query = `call DB2PROD.XQHASH(?)`;
  const outputValue = {
    ParamType: 'OUTPUT',
    DataType: 'BINARY',
    Data: Buffer.alloc(32),
    Length: 32,
  };

  return db.query(query, [outputValue]);
}

async function doSomeDbStuff(db: Database) {
  // Do some stuff
  const storedProcResults = await getHashFromDb(db);
  console.log((storedProcResults[0] as unknown as Buffer).toString('latin1'));
}

In this case, wouldn't the Buffer that goes into ibm_db go out of scope and possibly be GC'd before the results come back from DB2? I could imagine using a pattern like that to hide DB2 specific stuff in a DB layer.

If the concern with the malloc/memcpy approach is performance and/or resource utilization, what do you think about enhancing the code to look at the ParamType, and malloc & memcpy only if its OUTPUT or INOUT?

Thanks!

bimalkjha commented 9 months ago

@asselin No, it would not get claimed by GC as the memory pointed by Buffer is still in use by the new variable storedProcResults. GC marks memory for sweeping only if it is not referenced by any variable in active scope. The variables outputValue and outputValue.Data is not accessible in function doSomeDbStuff, but memory allocated by Buffer.alloc() is pointed by active variable storedProcResults in doSomeDbStuff. You can verify it using sample program.

Also, intension of passing Data: Buffer.alloc(32) for OUTPUT is to get result in this Buffer, so application should read the data from this Buffer before Buffer goes out of scope. Otherwise, there is no use of allocating Buffer by application if it do not read it!

The problem with current code is, GetOutputParameter() transfers the ownership of (char *)prm.buffer to JavaScript but frees it too as prm.isBuffer is not set true for string parameters of type INOUT and OUTPUT.

In the shared code of issue #942, I suspect the crash is not due to oHashedValue but it is due to parameters oErrorCode and oReasonCode as ParamType: AppConstants.HASH_PARAM_TYPE is same for all three which is OUTPUT? Please check. Thanks.

asselin commented 9 months ago

@bimalkjha

The problem with current code is, GetOutputParameter() transfers the ownership of (char *)prm.buffer to JavaScript but frees it too as prm.isBuffer is not set true for string parameters of type INOUT and OUTPUT.

Agreed, that's the issue with non-Buffer parameters (#943).

In the shared code of issue https://github.com/ibmdb/node-ibm_db/issues/942, I suspect the crash is not due to oHashedValue but it is due to parameters oErrorCode and oReasonCode as ParamType: AppConstants.HASH_PARAM_TYPE is same for all three which is OUTPUT?

I'm 100% sure the issue is with oHashedValue; I've verified both of the fixes I posted in the PRs fix the 2 issues. I tested the code, only changing the type of oHashedValue between String and Buffer, and apply each fix separately. If the issue was with oErrorCode or oReasonCode, then the test I ran with oHashedValue being type Buffer, and this PR being applied, the code would still have crashed, but it didn't (I had it running in a loaded test environment, where it crashed every 30-45 minutes without the fixes, and after the fixes, has run for about 4 days with no crashes).

Also, intension of passing Data: Buffer.alloc(32) for OUTPUT is to get result in this Buffer, so application should read the data from this Buffer before Buffer goes out of scope. Otherwise, there is no use of allocating Buffer by application if it do not read it!

I disagree-- the result is available from storedProcResults[0], so doSomeDbStuff doesn't have to maintain a reference to the Buffer that was sent in. In this case, getHashFromDb can be used to encapsulate the details of the database query, and doSomeDbStuff doesn't need to know the exact details, only that the promise returned by getHashFromDb will be an array of the results.

No, it would not get claimed by GC as the memory pointed by Buffer is still in use by the new variable storedProcResults. GC marks memory for sweeping only if it is not referenced by any variable in active scope. The variables outputValue and outputValue.Data is not accessible in function doSomeDbStuff, but memory allocated by Buffer.alloc() is pointed by active variable storedProcResults in doSomeDbStuff. You can verify it using sample program.

I'll try writing a test case for this, because I believe that as soon as getHashFromDb returns its Promise, the getHashFromDb scope is gone, and outputValue and outputValue.Data are going to be unreachable. This might be harder to prove than these 2 double free bugs, but I think it's possible to force a GC, so maybe I'll have some luck in that direction.

Thank you for your time and effort on these issues!

asselin commented 9 months ago

@bimalkjha ,

I'm working on a test case to see if the Buffer is GC'd, but unfortunately, the code you posted for the Buffer case doesn't compile. This is the snippet (note: I had to change the type of prm from Parameter to Parameter*, so that's why it's -> instead of .):

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
          if(prm->isBuffer) {
            return scope.Escape((prm->buffer).ToLocalChecked());
          } else {
            prm->isBuffer = true;
            return scope.Escape(Nan::NewBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
          }
      } else {
        str = Nan::New((UNICHAR *) prm->buffer).ToLocalChecked();
      }

Gives this error:

#24 5.390 ../src/odbc.cpp: In static member function 'static v8::Local<v8::Value> ODBC::GetOutputParameter(Parameter*)':
#24 5.390 ../src/odbc.cpp:897:47: error: request for member 'ToLocalChecked' in 'prm->Parameter::buffer', which is of non-class type 'SQLPOINTER' {aka 'void*'}
#24 5.390   897 |             return scope.Escape((prm->buffer).ToLocalChecked());
#24 5.390       |                                               ^~~~~~~~~~~~~~

Unfortunately, I don't these internals, so I'm not sure how to proceed. If you can paste a new snippet, I've got a test program I can try it against.

BTW, would it be useful to you if we opened a PMR for this issue? If so, please let me know.

Thanks again!

asselin commented 9 months ago

@bimalkjha Thank you for merging the other PR. I merged master into this PR to bring it up to date.

bimalkjha commented 9 months ago

@asselin You was getting crash because you have removed param->isBuffer = true; setting from ODBC::GetBufferParam in PR #943. Also, there was few issues with current odbc.cpp. So, I have reworked on it and pushed changes as latest commit https://github.com/ibmdb/node-ibm_db/commit/5995ef232c5a2a901f6358267478241762639ee6. Please pull the latest code and test it. It should fix the issues. Thanks.

asselin commented 9 months ago

@bimalkjha

In the change I sent, I moved setting isBuffer from GetBufferParam to GetOutputParameter. I did that so that ALL cases where a Buffer was going to be returned with NewBuffer (i.e. regardless of whether a string or Buffer was sent INTO the call) would set isBuffer, so that the memory allocated would not be freed (because NewBuffer takes it over). That change is what fixed the double free when sending in a string and type SQL_C_BINARY.

The crash when sending in a Buffer with type SQL_C_BINARY was caused by the original Buffer sent into the function being freed, and also the Javascript runtime freeing the new Buffer allocated with NewBuffer; it wasn't caused by isBuffer being false. When you call NewBuffer, you're transferring ownership of the memory to the JS runtime. What the old code transferred was the memory from the Buffer that was sent in, which was ALSO under control of the JS runtime. JS thought they were 2 separate Buffers pointing to 2 separate areas of memory, but in fact they were 2 separate Buffers pointing to the SAME area of memory. So when they were GC'd, the JS runtime freed the first Buffer and it's memory just fine, but when it freed the 2nd Buffer, it freed the same memory twice, and hence the crash.

I looked over the changes you made and spotted a few issues.

For reference, I'm copying the new code:

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
          // Use CopyBuffer to create a new buffer to return to NodeJS,
          // prm->buffer can be freed. Return Buffer for Binary OUTPUT params.
          // Nan::CopyBuffer() is Similar to Nan::NewBuffer() except that an
          // implicit memcpy will occur within Node. Calls node::Buffer::Copy().
          return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
      }
      else {
        // If Buffer was passed by JS to C++ for CHAR type, return Buffer only.
        // If String was passed by JS to C++, return OUTPUT as char*
        // Do not free such char* as ownership is getting transferred to JS.
        if(prm->isBuffer) { // Buffer was passed by NodeJS to C++
          return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
        } else {
          prm->isBuffer = true; // Dont free it in FREE_PARAMS
          str = Nan::New((UNICHAR *) prm->buffer).ToLocalChecked();
        }
      }

The first issue is that this is going to result in a memory leak:

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
      }
      else {
        if(prm->isBuffer) { // Buffer was passed by NodeJS to C++
        } else {
          prm->isBuffer = true; // Dont free it in FREE_PARAMS
          str = Nan::New((UNICHAR *) prm->buffer).ToLocalChecked();
        }
      }

Nan:New doesn't take over ownership of the memory given to it, so the call to free SHOULD happen in this case. The fix for this one is easy-- just remove the prm->isBuffer = true;.

Another thing I spotted is that for parameters of type string and SQL_C_BINARY, this code could be more efficient.

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
          // Use CopyBuffer to create a new buffer to return to NodeJS,
          // prm->buffer can be freed. Return Buffer for Binary OUTPUT params.
          // Nan::CopyBuffer() is Similar to Nan::NewBuffer() except that an
          // implicit memcpy will occur within Node. Calls node::Buffer::Copy().
          return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
      }
      else {
      }

In GetStringParam, memory is allocated. Here in GetOutputParameter, it's going to use CopyBuffer, which is going to allocate memory again and copy it. The memory that odbc.cpp allocated is freed immediately afterwards, so it would be more efficient to call NewBuffer in this case, and just hand the memory over to the JS runtime. Obviously, this isn't a correctness issue, but the change is pretty easy:

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
        if(prm->isBuffer) { // Buffer was passed by NodeJS to C++
          // Use CopyBuffer to create a new buffer to return to NodeJS,
          // prm->buffer can be freed. Return Buffer for Binary OUTPUT params.
          // Nan::CopyBuffer() is Similar to Nan::NewBuffer() except that an
          // implicit memcpy will occur within Node. Calls node::Buffer::Copy().
          return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
        } else {
          prm->isBuffer = true; // Don't free it in FREE_PARAMS
          return scope.Escape(Nan::NewBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
        }
      }
      else {
      }

The last issue is the same one I mentioned previously where I believe sending in a Buffer with type SQL_C_BINARY can result in memory corruption because the Buffer that is sent in can be freed by GC while odbc.cpp still retains the pointer to the Buffer's memory. I'll work on a testcase for this and see if I can prove the theory.

Update I tried various combinations to try and see if this is an issue, but never was able to make it fail. I think I was wrong about the scope terminating (like you said), so you can ignore this-- it doesn't look like its an issue.

Thank you!

asselin commented 9 months ago

Putting all of the above together, here's a suggested change:

      // If Buffer was passed by JS to C++ for CHAR type, return Buffer only.
      // Do not free such char* as ownership is getting transferred to JS.
      if(prm->isBuffer) { // Buffer was passed by NodeJS to C++
        return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
      }

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
        prm->isBuffer = true; // Don't free it in FREE_PARAMS
        return scope.Escape(Nan::NewBuffer((char *)prm->buffer, prm->length).ToLocalChecked());        
      }
      else {
        // If String was passed by JS to C++, return OUTPUT as char*
        return scope.Escape(Nan::New((UNICHAR *) prm->buffer).ToLocalChecked());
      }
bimalkjha commented 9 months ago

@asselin Setting prm->isBuffer = true in GetBufferParam stores two information:

  1. JS has passed a Buffer to C++. The column may be binary or CHAR or WCHAR type. So, C++ need to return Buffer only.
  2. Do not free such prm->buffer in C++.

If we don't set prm->isBuffer = true in GetBufferParam and paramtype is INPUT; FREE_PARAMS will free the Buffer memory leading crash. For INPUT type parameter, GetOutputParameter never get called. GetOutputParameter is called only for INOUT and OUTPUT paramtype. So, setting prm->isBuffer in GetBufferParam is must to identify a Buffer is passed by JS to C++ or not. We can't remove it.

If NodeJS has not passed a Buffer to C++ for SQL_BINARY, means its string passed by JS to C++, so no need to call Nan::NewBuffer((char *)prm->buffer and return Buffer to JS instead of string. I think below code is not right:

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
        prm->isBuffer = true; // Don't free it in FREE_PARAMS
        return scope.Escape(Nan::NewBuffer((char *)prm->buffer, prm->length).ToLocalChecked());        
      }

Now about your first point:

Nan:New doesn't take over ownership of the memory given to it, so the call to free SHOULD happen in this case. The fix for this one is easy-- just remove the prm->isBuffer = true;.

I am not sure about it. I did online search and found below info. Unable to find any other doc that could make it clear. If you are able to find any link, please share here:

image

So, I think str = Nan::New((UNICHAR *) prm->buffer).ToLocalChecked(); will not allocate any new memory but uses prm->buffer and just transfer ownership to str. If it is not correct, please share any doc link that could throw light on this behavior. Best way is to test it. I am going to verify it using test program and see memory leak is happening or not.

Finally, have you tested the latest code of ibm_db and validated memory leak issue? If latest code still shows any issue, then please share. Thanks.

bimalkjha commented 9 months ago

If we check this link: https://github.com/nodejs/nan/blob/main/doc/buffers.md#nannewbuffer and https://stackoverflow.com/questions/65797684/node-add-on-nannewbuffer-causes-memory-leak , found that the gc routine calls free not delete ("...data will be disposed of via a call to free()..."); and I don't think Nan::New() is calling malloc internally to allocate new memory for char. Unable to find any doc yet that says Nan::New() will create a new memory for char. Thanks.

asselin commented 9 months ago

I looked through the C++ code, and admittedly, my C++ is rusty, but I believe the calling sequence is this:

When you call Nan::New(char*), I believe it uses this code: https://github.com/nodejs/nan/blob/e14bdcd1f72d62bca1d541b66da43130384ec213/nan_new.h#L312

CleanShot 2023-09-19 at 20 55 11@2x

That calls https://github.com/nodejs/nan/blob/e14bdcd1f72d62bca1d541b66da43130384ec213/nan_new.h#L206

CleanShot 2023-09-19 at 20 55 44@2x

That then calls https://github.com/nodejs/nan/blob/e14bdcd1f72d62bca1d541b66da43130384ec213/nan_new.h#L161

CleanShot 2023-09-19 at 20 58 48@2x

That calls https://github.com/nodejs/nan/blob/main/nan_implementation_12_inl.h#L281

CleanShot 2023-09-19 at 20 59 50@2x

If it's a UNICHAR instead of char, it's similar, but winds up calling https://github.com/nodejs/nan/blob/main/nan_implementation_12_inl.h#L294

CleanShot 2023-09-19 at 21 01 40@2x

The docs for v8::String::NewFromUtf8 and v8::String::NewFromTwoByte here https://v8.github.io/api/head/classv8_1_1String.html are almost identical: "Allocates a new string from UTF-8 data. Only returns an empty value when length > kMaxLength."

So the way I interpret it, I think memory is allocated in this case. This directly contradicts what you found, and what you found is very clear in what it says, so I think we're going to have to test it. I've had very good results with getting double free bugs to reproduce quickly, so tomorrow, I'll set it up to free the memory and see if I can get it to crash. If it doesn't crash, then we can test without the free to verify if that results in a memory leak.

Does that all make sense? Sound like a plan?

asselin commented 9 months ago

For INPUT type parameter, GetOutputParameter never get called.

That's a good point that I missed. Makes sense now, I agree, isBuffer needs to be set in GetBufferParam.

If NodeJS has not passed a Buffer to C++ for SQL_BINARY, means its string passed by JS to C++, so no need to call Nan::NewBuffer((char *)prm->buffer and return Buffer to JS instead of string.

The existing code will also return a Buffer in that case. If the type is SQL_BINARY, the if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) { will be true, and it will call CopyBuffer.

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
          // Use CopyBuffer to create a new buffer to return to NodeJS,
          // prm->buffer can be freed. Return Buffer for Binary OUTPUT params.
          // Nan::CopyBuffer() is Similar to Nan::NewBuffer() except that an
          // implicit memcpy will occur within Node. Calls node::Buffer::Copy().
          return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
      }
      else {
        // If Buffer was passed by JS to C++ for CHAR type, return Buffer only.
        // If String was passed by JS to C++, return OUTPUT as char*
        // Do not free such char* as ownership is getting transferred to JS.
        if(prm->isBuffer) { // Buffer was passed by NodeJS to C++
          return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
        } else {
          prm->isBuffer = true; // Dont free it in FREE_PARAMS
          str = Nan::New((UNICHAR *) prm->buffer).ToLocalChecked();
        }
      }

It's all based on whether the caller set the type to BINARY or not. If its BINARY, the current code always returns a Buffer. The change I'm proposing is if a string was passed in, then instead of calling CopyBuffer and then freeing the memory, instead call NewBuffer and don't free the memory.

      // If Buffer was passed by JS to C++ for CHAR type, return Buffer only.
      // Do not free such char* as ownership is getting transferred to JS.
      if(prm->isBuffer) { // Buffer was passed by NodeJS to C++
        return scope.Escape(Nan::CopyBuffer((char *)prm->buffer, prm->length).ToLocalChecked());
      }

      if(sqlTypeBinary || prm->c_type == SQL_C_BINARY) {
        prm->isBuffer = true; // Don't free it in FREE_PARAMS
        return scope.Escape(Nan::NewBuffer((char *)prm->buffer, prm->length).ToLocalChecked());        
      }
      else {
        // If String was passed by JS to C++, return OUTPUT as char*
        return scope.Escape(Nan::New((UNICHAR *) prm->buffer).ToLocalChecked());
      }
bimalkjha commented 9 months ago

@asselin I also tested the code using SP with all types of params. Was able to notice the memleak when Nan::New((UNICHAR *) prm->buffer) was not getting freed. So, your explanation looks good. Tested the latest code with same test for 1hr. Saw heapMemory stable after sometime and it stopped growing, confirming there is no memory leak. I added one commit in your branch to clean the code and add few comments. Finally, it looks good, so merged it. Thanks.

asselin commented 9 months ago

@bimalkjha Awesome, thank you! I also tested, to verify that I don't see any double frees or memory corruption, and I don't.

So... million dollar question :) When do you think this will be published to npm?

Thank you!

bimalkjha commented 9 months ago

@asselin We published it to npm today. Thanks.