Closed mhdawson closed 7 years ago
@nodejs/platform-aix
/cc @nodejs/async_hooks
Recreated (intermittent) the crash in the compiler:
#0 0x000000010050e3d4 in v8::internal::AstValueFactory::GetString(unsigned int, bool, v8::internal::Vector<unsigned char const>) ()
(gdb) where
#0 0x000000010050e3d4 in v8::internal::AstValueFactory::GetString(unsigned int, bool, v8::internal::Vector<unsigned char const>) ()
#1 0x000000010050e9e8 in v8::internal::AstValueFactory::GetOneByteStringInternal(v8::internal::Vector<unsigned char const>) ()
#2 0x000000010050e9e8 in v8::internal::AstValueFactory::GetOneByteStringInternal(v8::internal::Vector<unsigned char const>) ()
#3 0x000000010052a7ec in v8::internal::Scanner::IsDuplicateSymbol(v8::internal::DuplicateFinder*, v8::internal::AstValueFactory*) const ()
#4 0x0000000100529b5c in v8::internal::ParserBase<v8::internal::Parser>::ParseAndClassifyIdentifier(bool*)
()
#5 0x000000010052895c in v8::internal::ParserBase<v8::internal::Parser>::ParsePrimaryExpression(bool*, bool*) ()
#6 0x0000000100528040 in v8::internal::ParserBase<v8::internal::Parser>::ParseFormalParameter(v8::internal::ParserFormalParameters*, bool*) ()
#7 0x000000010055fc24 in v8::internal::ParserBase<v8::internal::Parser>::ParseFormalParameterList(v8::internal::ParserFormalParameters*, bool*) ()
#8 0x000000010055f2d8 in v8::internal::Parser::ParseFunction(v8::internal::AstRawString const*, int, v8::internal::FunctionKind, v8::internal::FunctionLiteral::FunctionType, v8::internal::DeclarationScope*, int*, int*, bool*, int*, bool*) ()
#9 0x000000010055e234 in v8::internal::Parser::ParseFunctionLiteral(v8::internal::AstRawString const*, v8::internal::Scanner::Location, v8::internal::FunctionNameValidity, v8::internal::FunctionKind, int, v8::internal::FunctionLiteral::FunctionType, v8::internal::LanguageMode, bool*) ()
#10 0x00000001005271e8 in v8::internal::Parser::DoParseFunction(v8::internal::ParseInfo*, v8::internal::AstRawString const*) ()
#11 0x000000010050c4e8 in v8::internal::Parser::ParseFunction(v8::internal::Isolate*, v8::internal::ParseInfo*) ()
#12 0x000000010050b878 in v8::internal::parsing::ParseFunction(v8::internal::ParseInfo*, v8::internal::Isolate*, bool) ()
#13 0x00000001004bf730 in v8::internal::(anonymous namespace)::GetUnoptimizedCode(v8::internal::CompilationInf---Type <return> to continue, or q <return> to quit---
o*, v8::internal::Compiler::ConcurrencyMode) ()
#14 0x00000001004c1ef4 in v8::internal::(anonymous namespace)::GetLazyCode(v8::internal::Handle<v8::internal::JSFunction>) ()
#15 0x00000001001b5108 in v8::internal::Compiler::Compile(v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Compiler::ClearExceptionFlag) ()
#16 0x0000000100c7f920 in v8::internal::Runtime_CompileLazy(int, v8::internal::Object**, v8::internal::Isolate*) ()
#17 0x07000000001845f4 in ?? ()
Still a thing...
https://ci.nodejs.org/job/node-test-commit-aix/6522/nodes=aix61-ppc64/console
not ok 13 async-hooks/test-fseventwrap
---
duration_ms: 0.173
severity: crashed
stack: |-
oh no!
exit code: CRASHED (Signal: 11)
there are three mutually exclusive issues:
ld: 0711-317 ERROR: Undefined symbol: vtable for v8::base::PosixTimezoneCache
v8::internal::AstValueFactory::GetString
Debugging item 3. at the moment.
@gireeshpunathil
- the test case test-fseventwrap.js employs directory watch, which is not supported in AIX
Why would only this test fail when there are other tests that check directory watch.
@trevnorris - this test did not fail due to (1) above (yet) - the test progress is masked off by the crash
Why would only this test fail when there are other tests that check directory watch.
The other test files that check directory watch should already be skipped for AIX.
The other test files that check directory watch should already be skipped for AIX.
Ah yup. I see the isAix
check in those other tests.
(dbx) (0x10050e5c0)/20i
0x10050e5c0 (GetString+0x118) eb21ffc8 ld r25,-56(r1)
0x10050e5c4 (GetString+0x11c) eb41ffd0 ld r26,-48(r1)
0x10050e5c8 (GetString+0x120) eb61ffd8 ld r27,-40(r1)
0x10050e5cc (GetString+0x124) eb81ffe0 ld r28,-32(r1)
0x10050e5d0 (GetString+0x128) 7c0803a6 mtlr r0
0x10050e5d4 (GetString+0x12c) eba1ffe8 ld r29,-24(r1)
0x10050e5d8 (GetString+0x130) ebc1fff0 ld r30,-16(r1)
0x10050e5dc (GetString+0x134) ebe1fff8 ld r31,-8(r1)
0x10050e5e0 (GetString+0x138) 4e800020 blr
0x10050e5e4 (GetString+0x13c) 60000000 ori r0,r0,0x0
0x10050e5e8 (GetString+0x140) f8410028 std r2,0x28(r1)
0x10050e5ec (GetString+0x144) 7f63db78 mr r3,r27
0x10050e5f0 (GetString+0x148) e9370010 ld r9,0x10(r23)
0x10050e5f4 (GetString+0x14c) e9490000 ld r10,0x0(r9) >>>>> crash
0x10050e5f8 (GetString+0x150) e9690010 ld r11,0x10(r9)
Failing instruction corresponds to stringtable.LookupOrInsert
dbx) x
$r0:0x000000010050ec08 $stkp:0x0fffffffffffc330 $toc:0x00000001101d6fc0
$r3:0x0fffffffffffc3a0 $r4:0x0000000110acf270 $r5:0x0000000000000001
$r6:0x0000000110aa7430 $r7:0x0000000200000000 $r8:0x0000000000000040
$r9:0x5b5b5b5b00000000 $r10:0x0000000110ac73b0 $r11:0x0000000000000000
$r12:0x0000000022222244 $r13:0x00000001102086a0 $r14:0x0fffffffffffcd08
$r15:0x0fffffffffffcd30 $r16:0x0fffffffffffcd28 $r17:0x0fffffffffffcae0
$r18:0x0fffffffffffcd28 $r19:0x0000000000000001 $r20:0x0000000000000081
$r21:0x0000000110ace9f8 $r22:0x0fffffffffffcd30 $r23:0x0000000110ac4c90
$r24:0x0000000200000000 $r25:0x0000000110aa7430 $r26:0x0000000000000001
$r27:0x0fffffffffffc3a0 $r28:0x000000009ba358f2 $r29:0x0000000000000008
$r30:0x00000000000000c0 $r31:0x0fffffffffffc330
$iar:0x000000010050e5f4 $msr:0xa00000000000d032 $cr:0x482222a2
$link:0x000000010050ec08 $ctr:0x0000000000000000 $xer:0x00000001
r23 is string_table_ (base::CustomMatcherHashMap)
, and r9 is supposed to be its vtable from where to look for function LookupOrInsert()
.
I verified that the object was proper after the construction, looks like later its fields were overlaid by this pattern:
(dbx) ($r23)/8X
0x0000000110ac4c90: 00000001 10ac73b0 00000040 0000002c
0x0000000110ac4ca0: 5b5b5b5b 00000000 00000000 00000000
Pattern 0x5b5b5b5b
is typically invalid block eye-catcher.
I ran DEBUGMALLOC
on allocations and frees, but it did not catch anything in this area.
So it looks like an internal memory corruption through a bad pointer.
Debugging further.
Just got the same error - https://ci.nodejs.org/job/node-test-commit-aix/6559/nodes=aix61-ppc64/console
Again today:
https://ci.nodejs.org/job/node-test-commit-aix/6663/nodes=aix61-ppc64/console
not ok 13 async-hooks/test-fseventwrap
---
duration_ms: 0.183
severity: crashed
stack: |-
oh no!
exit code: CRASHED (Signal: 11)
reduced test case that crashes occasionally in AIX:
const fs = require('fs')
const watcher = fs.watch(__dirname, () => {})
watcher.close()
console.log('done')
The watcher.close()
seem to be the one which corrupts the memory (string_table of the parser).
Debugging further.
Even with single-thread, memory corruption seem to be a hard thing to crack. At the end of a number of attempts, DEBUGMALLOC finally reported a bad free on a pointer which it never allocated into, at uv__fs_event_close:
(gdb) where
#0 0x09000000007e7eb4 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
#1 0x09000000007e770c in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
#2 0x090000000002bab0 in raise () from /usr/lib/libc.a(shr_64.o)
#3 0x090000000007c9b0 in abort () from /usr/lib/libc.a(shr_64.o)
#4 0x09000000000a4cfc in free_y_debugging () from /usr/lib/libc.a(shr_64.o)
#5 0x090000000009e994 in free_common_debugging () from /usr/lib/libc.a(shr_64.o)
#6 0x00000001000318f0 in uv__free (ptr=0x110aacb90) at ../deps/uv/src/uv-common.c:85
#7 0x000000010003067c in uv__fs_event_close (handle=0x16) at ../deps/uv/src/unix/aix.c:877
#8 0x000000010002aeec in uv_close (handle=0x16, close_cb=0x110aacb90) at ../deps/uv/src/unix/core.c:155
#9 0x0000000101127da4 in node::HandleWrap::Close(v8::FunctionCallbackInfo<v8::Value> const&) ()
#10 0x000000010144b62c in node::(anonymous namespace)::FSEventWrap::Close(v8::FunctionCallbackInfo<v8::Value> const&) ()
#11 0x0000000100244e68 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#12 0x000000010023d068 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#13 0x000000010079afb8 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#14 0x07000000001845f4 in ?? ()
In uv__path_is_a_directory the variable handle->dir_filename
is freed unconditionally while there exists a control flow wherein this is uninitialized - if the watch event is never fired.
@gireeshpunathil What does f 10
, p *wrap
print?
This one resolves the issue consistently. I ran it a 3K times, and saw no failures.
diff --git a/deps/uv/src/unix/aix.c b/deps/uv/src/unix/aix.c
index 388c9cc..6314096 100644
--- a/deps/uv/src/unix/aix.c
+++ b/deps/uv/src/unix/aix.c
@@ -855,6 +855,7 @@ int uv_fs_event_start(uv_fs_event_t* handle,
uv__io_init(&handle->event_watcher, uv__ahafs_event, fd);
handle->path = uv__strdup(filename);
handle->cb = cb;
+ handle->dir_filename = NULL;
uv__io_start(handle->loop, &handle->event_watcher, POLLIN);
@@ -874,8 +875,10 @@ int uv_fs_event_stop(uv_fs_event_t* handle) {
uv__handle_stop(handle);
if (uv__path_is_a_directory(handle->path) == 0) {
- uv__free(handle->dir_filename);
- handle->dir_filename = NULL;
+ if (handle->dir_filename != NULL) {
+ uv__free(handle->dir_filename);
+ handle->dir_filename = NULL;
+ }
}
uv__free(handle->path);
Leaving it for a while for some review and then come up with a PR.
@bnoordhuis - I overwrote the core file. (will get it once again.)
@gireeshpunathil uv__free(ptr)
is a no-op when ptr == NULL
, you shouldn't need a guard. Looks like it's just a case of improper initialization (and a rather obvious one, in retrospect, but they always are.)
@bnoordhuis - Looks like the values are optimized out in the release build. Does the raw offsets reveal / confirm anything?
(gdb) p * wrap
No symbol "wrap" in current context.
(gdb) p *(FSEventWrap *) 0xfffffffffffdcf8
No symbol "FSEventWrap" in current context.
(gdb) p *(node::FSEventWrap *) 0xfffffffffffdcf8
No symbol "node" in current context.
(gdb) x/10x 0xfffffffffffdcf8
0xfffffffffffdcf8: 0x00000004 0x00000000 0x07000000 0x208d3d01
0xfffffffffffdd08: 0x00000000 0x00000016 0x07000000 0x002bbf00
0xfffffffffffdd18: 0x0fffffff 0xffffdd68
(gdb)
Thanks for the info on the uv__free. So that means the only change is the initializer for the uninitialized field.
item no. 3 is being addressed through libuv PR 1400
Next - item no. 2: AIX fails to build in debug mode: throws linker error:
ld: 0711-317 ERROR: Undefined symbol: vtable for v8::base::PosixTimezoneCache
/cc @jbajwa
-O3
build fine, while -O0
throws up the above error.
Looks like test-fseventwrap fails intermitently on AIX:
https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/6463/console
I believe I've seen it on at least one other PR unrelated to fswatch/asyncwrap.
@gireeshpunathil can you take a look.