josephg / node-foundationdb

Modern Node.js FoundationDB bindings
Other
115 stars 17 forks source link

Arguments error #66

Open 3g0r opened 2 years ago

3g0r commented 2 years ago

Hello.

We catch Invalid arguments exception on

Stack trace:

Error: 
      Invalid arguments
      ---------
      Invalid argument
      ---------
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z16create_backtracev+0x27) [0x7f6f787dd6b1]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z14create_tracingP10napi_env__PKc+0x26) [0x7f6f787dd866]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z15throw_if_not_okP10napi_env__11napi_status+0x58) [0x7f6f787dd983]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z10getWrappedP10napi_env__P20napi_callback_info__+0x87) [0x7f6f787ddc1d]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(+0x56c3) [0x7f6f787d86c3]
      node() [0xaabfcd]
      node() [0xd4a35e]
      node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xaf) [0xd4b77f]
      node() [0x15e78f9]
    at Database.rawCreateTransaction (node_modules/foundationdb/dist/lib/database.js:55:51)
    at Database.doTn (node_modules/foundationdb/dist/lib/database.js:39:21)
    at Database.doTransaction (node_modules/foundationdb/dist/lib/database.js:43:21)
    at Object.doTransaction (node_modules/@deeplay/foundationdb-event-store-core/lib/utils/doTransaction.js:13:10)
    at Object.doIdempotentTransaction (node_modules/@deeplay/foundationdb-event-store-core/lib/index.js:68:94)
    at writeTransition (build/table/writeTransition.js:26:37)
    at writeResult.baseMs (build/table/doTask.js:30:119)
    at retry (node_modules/abort-controller-x/lib/retry.js:13:26)
    at doTask (build/table/doTask.js:30:66)
    at build/table/index.js:129:62

When calling napi_get_cb_info we obtain a pointer to thisObj from native code using the callback’s descriptor. After that we try to napi_unwrap the pointer, but the arguments error appear: https://github.com/josephg/node-foundationdb/blob/master/src/utils.cpp#L79

I am not sure how to stably reproduce that behaviour. I was able to check the backtrace when I remove the pod, and the service forcedly stopped. But we catch same error sometimes while service is working as usual, and in this case all of the new transactions after the error will be completed with the same error.

Could you please help me to dig deeper to find the root of the issue? Can we somehow get an enriched logging to understand what is wrong and whether the bug is on our side or the native code one?

josephg commented 2 years ago

Interesting! Does it always happen when the service is forcibly stopped? I’d love to get a repro case if we can figure one out.

I can add some print statements or something, but I’m not sure how much that’d help..

3g0r commented 2 years ago

Does it always happen when the service is forcibly stopped?

Not sure. I try to create a setup for a stable reproduction, although I'm not sure of success - seems we catch that error only in k8s environment.

Maybe you can give me recommendations about nodegyp-build setup to collect more information in backtrace?

josephg commented 2 years ago

seems we catch that error only in k8s environment.

That’s weird!

Also with node-gyp, based on that stack trace it looks like you’re already doing a build for the docker container. That stack trace comes from a local debug build. (Did you opt in to that?). I wonder if the fact you’re using a debug build in k8s has something to do with the error showing up.

When you do an npm install, the fdb npm package first tries to load from prebuilds/ first and if any error happens it runs a build. To build with gyp, you’ll need a local build environment (apt install build-essential) and then run npx node-gyp build —debug.

josephg commented 2 years ago

Hah - I was just looking at where the string "Invalid arguments" is coming from. Its a napi error, and my code in utils.cpp emits that string. Apparently I've had that problem before, because I left some commented out code for printing a full stack trace at the location where the error is coming from.

Restore the code like this and see if that gives you more information:

// ...

#include <execinfo.h>
#include <stdlib.h>

// Useful for figuring out where an error is coming from.
// From GCC docs
void print_trace ()
{
  void *array[10];
  size_t size;
  char **strings;
  size_t i;

  size = backtrace (array, 10);
  strings = backtrace_symbols (array, size);

  printf ("Obtained %zd stack frames.\n", size);

  for (i = 0; i < size; i++)
     printf ("%s\n", strings[i]);

  free (strings);
}

napi_status throw_if_not_ok(napi_env env, napi_status status) {
  switch (status) {
    case napi_ok: case napi_pending_exception:
      return status;
    case napi_invalid_arg:
      print_trace(); // <-- uncomment this line
      napi_throw_error(env, NULL, "Invalid arguments");
      return napi_pending_exception;
    case napi_number_expected:
      napi_throw_type_error(env, NULL, "Expected number");
      return napi_pending_exception;
    case napi_string_expected:
      napi_throw_type_error(env, NULL, "Expected string");
      return napi_pending_exception;
    case napi_generic_failure:
      napi_throw_type_error(env, NULL, "Generic failure");
      return napi_pending_exception;
    default:
      fprintf(stderr, "throw_if_not_ok %d\n", status);
      assert(0);
      return napi_invalid_arg; // Unreachable but silences a vcc warning.
  }
}
3g0r commented 2 years ago

Yes, I already use this snippet :) But I send stack trace with the string Invalid arguments and a result of napi_get_last_error_info into js, in the same way, I rebuild the nodejs-foundationdb library with debug symbols npm run install --build-from-source --debug.

All of that manipulations were be for investigation and getting stack traces from native code.

The Invalid arguments error appeared while we used the original package.

3g0r commented 2 years ago

That’s weird!

Agree.

We didn't find any way for stable reproduction and tried to get more information from the production environment. But I work on this problem. Maybe I can do it.

josephg commented 2 years ago

The sequence of events here is:

  rawCreateTransaction(opts?: TransactionOptions) {
    return new Transaction<KeyIn, KeyOut, ValIn, ValOut>(this._db.createTransaction(), false, this.subspace, opts)
  }
static napi_value createTransaction(napi_env env, napi_callback_info info) {
  FDBDatabase *db = (FDBDatabase *)getWrapped(env, info);
  // ..

And getWrapped (why is the function name spelled inconsistently?) looks like this:

void *getWrapped(napi_env env, napi_callback_info info) {
  void *data;
  napi_value obj;
  NAPI_OK_OR_RETURN_NULL(env, napi_get_cb_info(env, info, 0, NULL, &obj, NULL));
  NAPI_OK_OR_RETURN_NULL(env, napi_unwrap(env, obj, &data));
  return data;
}

Now, what I'm not sure about is which of those calls (napi_get_cb_info or napi_unwrap) is generating this napi_invalid_arg error. Maybe try replacing getWrapped with this:

void *unwrap(napi_env env, napi_value obj) {
  void *data;
  NAPI_OK_OR_RETURN_NULL(env, napi_unwrap(env, obj, &data));
  return data;
}

void *getWrapped(napi_env env, napi_callback_info info) {
  napi_value obj;
  NAPI_OK_OR_RETURN_NULL(env, napi_get_cb_info(env, info, 0, NULL, &obj, NULL));
  return unwrap(env, obj);
}

This should not change the function's behaviour, but we will see if unwrap() shows up in the stack trace to find out which napi_ method is triggering the napi_invalid_arg error.

I'm also very suspicious of a non-deterministic error here. I wouldn't be surprised if its a bug in nodejs. If you're trying to reproduce it outside k8s, check you're using the exact same version of nodejs in both places.

3g0r commented 2 years ago

I followed your recommendations and divide getWrapped into two functions. The stack trace below is taken from a new build:

Error: 
      Invalid arguments
      ---------
      Invalid argument
      ---------
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z16create_backtracev+0x27) [0x7f0d002b46c1]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z14create_tracingP10napi_env__PKc+0x26) [0x7f0d002b4876]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z15throw_if_not_okP10napi_env__11napi_status+0x58) [0x7f0d002b4993]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z6unwrapP10napi_env__P12napi_value__+0x37) [0x7f0d002b4bdd]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z10getWrappedP10napi_env__P20napi_callback_info__+0x73) [0x7f0d002b4c6f]
      /opt/app/node_modules/foundationdb/build/Debug/fdblib.node(+0x56d3) [0x7f0d002af6d3]
      node() [0xaabfcd]
      node() [0xd4a35e]
      node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xaf) [0xd4b77f]
      node() [0x15e78f9]
    at Database.rawCreateTransaction (node_modules/foundationdb/dist/lib/database.js:55:51)
    at Database.doTn (node_modules/foundationdb/dist/lib/database.js:39:21)
    at Database.doTransaction (node_modules/foundationdb/dist/lib/database.js:43:21)
    at Object.doTransaction (node_modules/@deeplay/foundationdb-event-store-core/lib/utils/doTransaction.js:13:10)
    at Object.doIdempotentTransaction (node_modules/@deeplay/foundationdb-event-store-core/lib/index.js:68:94)
    at writeTransition (build/table/writeTransition.js:26:37)
    at writeResult.baseMs (build/table/doTask.js:30:119)
    at retry (node_modules/abort-controller-x/lib/retry.js:13:26)
    at doTask (build/table/doTask.js:30:66)
    at build/table/index.js:129:62

This stack trace is taken in a similar way.

I'm still in the progress of creating a setup for stable reproduction