Open 3g0r opened 2 years ago
Interesting! Does it always happen when the service is forcibly stopped? I’d love to get a repro case if we can figure one out.
I can add some print statements or something, but I’m not sure how much that’d help..
Does it always happen when the service is forcibly stopped?
Not sure. I try to create a setup for a stable reproduction, although I'm not sure of success - seems we catch that error only in k8s environment.
Maybe you can give me recommendations about nodegyp-build setup to collect more information in backtrace?
seems we catch that error only in k8s environment.
That’s weird!
Also with node-gyp, based on that stack trace it looks like you’re already doing a build for the docker container. That stack trace comes from a local debug build. (Did you opt in to that?). I wonder if the fact you’re using a debug build in k8s has something to do with the error showing up.
When you do an npm install, the fdb npm package first tries to load from prebuilds/ first and if any error happens it runs a build. To build with gyp, you’ll need a local build environment (apt install build-essential) and then run npx node-gyp build —debug
.
Hah - I was just looking at where the string "Invalid arguments" is coming from. Its a napi
error, and my code in utils.cpp
emits that string. Apparently I've had that problem before, because I left some commented out code for printing a full stack trace at the location where the error is coming from.
Restore the code like this and see if that gives you more information:
// ...
#include <execinfo.h>
#include <stdlib.h>
// Useful for figuring out where an error is coming from.
// From GCC docs
void print_trace ()
{
void *array[10];
size_t size;
char **strings;
size_t i;
size = backtrace (array, 10);
strings = backtrace_symbols (array, size);
printf ("Obtained %zd stack frames.\n", size);
for (i = 0; i < size; i++)
printf ("%s\n", strings[i]);
free (strings);
}
napi_status throw_if_not_ok(napi_env env, napi_status status) {
switch (status) {
case napi_ok: case napi_pending_exception:
return status;
case napi_invalid_arg:
print_trace(); // <-- uncomment this line
napi_throw_error(env, NULL, "Invalid arguments");
return napi_pending_exception;
case napi_number_expected:
napi_throw_type_error(env, NULL, "Expected number");
return napi_pending_exception;
case napi_string_expected:
napi_throw_type_error(env, NULL, "Expected string");
return napi_pending_exception;
case napi_generic_failure:
napi_throw_type_error(env, NULL, "Generic failure");
return napi_pending_exception;
default:
fprintf(stderr, "throw_if_not_ok %d\n", status);
assert(0);
return napi_invalid_arg; // Unreachable but silences a vcc warning.
}
}
Yes, I already use this snippet :)
But I send stack trace with the string Invalid arguments
and a result of napi_get_last_error_info
into js, in the same way, I rebuild the nodejs-foundationdb library with debug symbols npm run install --build-from-source --debug
.
All of that manipulations were be for investigation and getting stack traces from native code.
The Invalid arguments
error appeared while we used the original package.
That’s weird!
Agree.
We didn't find any way for stable reproduction and tried to get more information from the production environment. But I work on this problem. Maybe I can do it.
The sequence of events here is:
database.ts
, doTn calls rawCreateTransaction: rawCreateTransaction(opts?: TransactionOptions) {
return new Transaction<KeyIn, KeyOut, ValIn, ValOut>(this._db.createTransaction(), false, this.subspace, opts)
}
_db.createTransaction()
, which immediately tries to figure out the _db
argument via this:static napi_value createTransaction(napi_env env, napi_callback_info info) {
FDBDatabase *db = (FDBDatabase *)getWrapped(env, info);
// ..
And getWrapped (why is the function name spelled inconsistently?) looks like this:
void *getWrapped(napi_env env, napi_callback_info info) {
void *data;
napi_value obj;
NAPI_OK_OR_RETURN_NULL(env, napi_get_cb_info(env, info, 0, NULL, &obj, NULL));
NAPI_OK_OR_RETURN_NULL(env, napi_unwrap(env, obj, &data));
return data;
}
Now, what I'm not sure about is which of those calls (napi_get_cb_info
or napi_unwrap
) is generating this napi_invalid_arg
error. Maybe try replacing getWrapped with this:
void *unwrap(napi_env env, napi_value obj) {
void *data;
NAPI_OK_OR_RETURN_NULL(env, napi_unwrap(env, obj, &data));
return data;
}
void *getWrapped(napi_env env, napi_callback_info info) {
napi_value obj;
NAPI_OK_OR_RETURN_NULL(env, napi_get_cb_info(env, info, 0, NULL, &obj, NULL));
return unwrap(env, obj);
}
This should not change the function's behaviour, but we will see if unwrap()
shows up in the stack trace to find out which napi_ method is triggering the napi_invalid_arg
error.
I'm also very suspicious of a non-deterministic error here. I wouldn't be surprised if its a bug in nodejs. If you're trying to reproduce it outside k8s, check you're using the exact same version of nodejs in both places.
I followed your recommendations and divide getWrapped
into two functions.
The stack trace below is taken from a new build:
Error:
Invalid arguments
---------
Invalid argument
---------
/opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z16create_backtracev+0x27) [0x7f0d002b46c1]
/opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z14create_tracingP10napi_env__PKc+0x26) [0x7f0d002b4876]
/opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z15throw_if_not_okP10napi_env__11napi_status+0x58) [0x7f0d002b4993]
/opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z6unwrapP10napi_env__P12napi_value__+0x37) [0x7f0d002b4bdd]
/opt/app/node_modules/foundationdb/build/Debug/fdblib.node(_Z10getWrappedP10napi_env__P20napi_callback_info__+0x73) [0x7f0d002b4c6f]
/opt/app/node_modules/foundationdb/build/Debug/fdblib.node(+0x56d3) [0x7f0d002af6d3]
node() [0xaabfcd]
node() [0xd4a35e]
node(_ZN2v88internal21Builtin_HandleApiCallEiPmPNS0_7IsolateE+0xaf) [0xd4b77f]
node() [0x15e78f9]
at Database.rawCreateTransaction (node_modules/foundationdb/dist/lib/database.js:55:51)
at Database.doTn (node_modules/foundationdb/dist/lib/database.js:39:21)
at Database.doTransaction (node_modules/foundationdb/dist/lib/database.js:43:21)
at Object.doTransaction (node_modules/@deeplay/foundationdb-event-store-core/lib/utils/doTransaction.js:13:10)
at Object.doIdempotentTransaction (node_modules/@deeplay/foundationdb-event-store-core/lib/index.js:68:94)
at writeTransition (build/table/writeTransition.js:26:37)
at writeResult.baseMs (build/table/doTask.js:30:119)
at retry (node_modules/abort-controller-x/lib/retry.js:13:26)
at doTask (build/table/doTask.js:30:66)
at build/table/index.js:129:62
This stack trace is taken in a similar way.
I'm still in the progress of creating a setup for stable reproduction
Hello.
We catch
Invalid arguments
exception onStack trace:
When calling
napi_get_cb_info
we obtain a pointer tothisObj
from native code using the callback’s descriptor. After that we try tonapi_unwrap
the pointer, but the arguments error appear: https://github.com/josephg/node-foundationdb/blob/master/src/utils.cpp#L79I am not sure how to stably reproduce that behaviour. I was able to check the backtrace when I remove the pod, and the service forcedly stopped. But we catch same error sometimes while service is working as usual, and in this case all of the new transactions after the error will be completed with the same error.
Could you please help me to dig deeper to find the root of the issue? Can we somehow get an enriched logging to understand what is wrong and whether the bug is on our side or the native code one?