nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.15k stars 29.36k forks source link

New Failures in AIX nightly run (possibly v8_inspector change) #7080

Closed mhdawson closed 8 years ago

mhdawson commented 8 years ago

3 new failures in last nights run, opening as single issue until more investigation is completed.

https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/195/console

not ok 6 parallel/test-async-wrap-post-did-throw
# FATAL ERROR: node::RandomBytesRequest() Out of Memory

not ok 8 parallel/test-async-wrap-throw-from-callback
# 
# assert.js:90
#   throw new assert.AssertionError({
#   ^
# AssertionError: pre closed with code null
#     at ChildProcess.child.on (/home/iojs/build/workspace/node-test-commit-aix/nodes/aix61-ppc64/test/parallel/test-async-wrap-throw-from-callback.js:65:14)
#     at emitTwo (events.js:106:13)
#     at ChildProcess.emit (events.js:191:7)
#     at maybeClose (internal/child_process.js:852:16)
#     at Process.ChildProcess._handle.onexit (internal/child_process.js:215:5)
  ---
  duration_ms: 0.978
  ...

not ok 177 parallel/test-crypto-random
# FATAL ERROR: node::RandomBytesRequest() Out of Memory
imran-iq commented 8 years ago

Here is the diff between build 194 (0 failures) and 195 (3 failures) https://github.com/nodejs/node/compare/112b9cdad78da77170436fb68b260f99772793ac...894203dae39c7f1f36fc6ba126bb5d782d79b744

mhdawson commented 8 years ago

re-run to see if they are consistent https://ci.nodejs.org/job/node-test-commit-aix/

mhdawson commented 8 years ago

Seem to fail consistently as failed in re-run

mhdawson commented 8 years ago

Seems to be related to malloc(0) behaviour. By default AIX fails malloc(0) and we've told it to have the linux malloc(0) behaviour. Not sure how but it looks like this change has somehow affected that.

Believe the failure in test-crypto-random is due to case where it does crypto.randomBytes(0) as this seems to fail with an out of memory.

@ofrobots does the inspector do anything to hook or intercept malloc ?

mhdawson commented 8 years ago

simplest test case to recreated:

-bash-4.3$ cat test.js
var crypto = require('crypto');
crypto.randomBytes(0);
imran-iq commented 8 years ago

Compiled with _ALL_SOURCE and _LINUX_SOURCE_COMPAT test still fail. https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/197/

ofrobots commented 8 years ago

@mhdawson I am not aware of any interception of malloc in v8_inspector, so this is curious. Let me know what you find out.

mhdawson commented 8 years ago

iWuzHere is also having some trouble compiling on our local machines (although builds ok at least on community CI) so this should be a fun one.

ofrobots commented 8 years ago

LMK how to help.

mhdawson commented 8 years ago

@ofrobots thanks for the offer, I think we need to gather some additional info about the compile failure (may be the compiler crashing) and I'm also trying a build with --without-inspector to see if that resolves the test failures. Once we have some more info I'll reach out.

mhdawson commented 8 years ago

Run here https://ci.nodejs.org/job/node-test-commit-aix-mdawson/9/ against PR that fixes --without-inspector to see if we still see failures if option is disabled, may help to focus on where to look if we don't see failures in that mode.

mhdawson commented 8 years ago

Tests seem to have passed when compiled with --without-inspector so we need to look at the diffs between that being enabled/not

imran-iq commented 8 years ago

Does something like this:

[ 'OS=="aix"', {
        'include_dirs': [ 'config/aix' ],
        'sources': [ 'config/aix/ares_config.h' ],
        'defines': [
          # Support for malloc(0)
          '_LINUX_SOURCE_COMPAT=1',
          '_ALL_SOURCE=1'],
      }],

Need to be added to inspector's gyp?

Does running with inspector cause issues on any other platform like windows or solaris?

ofrobots commented 8 years ago

@iWuzHere If AIX requires all C/C++ code to be compiled with those flags you mentioned, that maybe that's the problem. It is worth trying it out.

Does running with inspector cause issues on any other platform like windows or solaris?

I haven't seen problems with this on Windows and on SmartOS, and neither has the CI. This does seem like something specific to AIX.

/fyi @eugeneo.

ofrobots commented 8 years ago

@iWuzHere could you try with this: https://github.com/eugeneo/node/commit/bf4cbfd64bde96fdf4ca9f33d65199d72f5b315b on an AIX machine?

mhdawson commented 8 years ago

launched run here:

https://ci.nodejs.org/job/node-test-commit-aix/199/

mhdawson commented 8 years ago

Also trying to pare down where the issue might be I did a run with this change:

diff --git a/node.gyp b/node.gyp
index 05a5530..11f696a 100644
--- a/node.gyp
+++ b/node.gyp
@@ -255,15 +255,6 @@
             'HAVE_INSPECTOR=1',
             'V8_INSPECTOR_USE_STL=1',
           ],
-          'sources': [
-            'src/inspector_agent.cc',
-            'src/inspector_socket.cc',
-            'src/inspector_socket.h',
-            'src/inspector-agent.h',
-          ],
-          'dependencies': [
-            'deps/v8_inspector/v8_inspector.gyp:v8_inspector',
-          ],
           'include_dirs': [
             'deps/v8_inspector',
             'deps/v8_inspector/deps/wtf', # temporary

And the node exe built with that seems to pass the minimal test case without OOM

mhdawson commented 8 years ago

So will be interested in run above, as it would seem to indicate that the other changes in node.gyp are not related.

mhdawson commented 8 years ago

Looked at the differences in the compile line between the good/bad runs for node_crypto.cc but don't see anything obvious. The differences are those expected based on the changes to the gyp file and -D_LINUX_SOURCE_COMPAT is there in both cases.

imran-iq commented 8 years ago

You dont just need -D_LINUX_SOURCE_COMPAT but also -D_ALL_SOURCE to get linux malloc behaviour as described in the stdlib.h file on AIX

#if defined(_ALL_SOURCE) && defined(_LINUX_SOURCE_COMPAT)
#ifdef  _NO_PROTO
extern void *__linux_malloc();
extern void *__linux_realloc();
extern void *__linux_calloc();
extern void *__linux_valloc();
#else
extern void *__linux_malloc(size_t);
extern void *__linux_realloc(void *, size_t);
extern void *__linux_calloc(size_t, size_t);
extern void *__linux_valloc(size_t);
#endif
#define malloc __linux_malloc
#define calloc __linux_calloc
#define realloc __linux_realloc
#define valloc __linux_valloc
#endif  /* _LINUX_SOURCE_COMPAT */

I have already done a run with that change included in node.gyp but the tests still failed, hence the suggestion to try and add it to the inspectors gyp

eugeneo commented 8 years ago

@iWuzHere Can you try this on AIX - https://github.com/eugeneo/node/commit/de97daf98de030fc697acd8bed627293caf32370 ?

imran-iq commented 8 years ago

@eugeneo I have already done it locally and waiting for the compile to finish :smile:

imran-iq commented 8 years ago

Still fail with changes to inspector.gyp

imran-iq commented 8 years ago

Alright so did quite a bit of investigation ( still not sure why this is happening ). Compiled with -D_LINUX_SOURCE_COMPAT and -D_ALL_SOURCE added to node and inspector gyp

Checking symbols

./configure

Test fails

imrani@dal-pacha /tmp/imranbuild/node ./out/Debug/node test
FATAL ERROR: node::RandomBytesRequest() Out of Memory
IOT/Abort trap (core dumped)
imrani@dal-pacha /tmp/imranbuild/node dump -Tv -Xany ./out/Debug/node | grep malloc
[58]    0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) malloc
[92]    0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) __linux_malloc
[260]   0x00000000    undef      IMP     RW EXTref   libc.a(shr.o) __malloc_user_defined_name
[3967]  0x200658f0    .data      EXP     RW   Ldef        [noIMid] ares_malloc
[72217] 0x20132810    .data      EXP     DS   Ldef        [noIMid] uprv_malloc_57
[92580] 0x2016deb0    .data      EXP     DS   Ldef        [noIMid] CRYPTO_malloc_locked
[92582] 0x2016dec8    .data      EXP     DS   Ldef        [noIMid] CRYPTO_malloc
[92587] 0x2016df04    .data      EXP     DS   Ldef        [noIMid] CRYPTO_remalloc
[92597] 0x2016df7c    .data      EXP     DS   Ldef        [noIMid] CRYPTO_dbg_malloc
[94250] 0x20172d70    .data      EXP     DS   Ldef        [noIMid] ares_malloc_data
[94342] 0x201731e4    .data      EXP     DS   Ldef        [noIMid] uv__malloc

./configure --without-inspector

Test passes

imrani@dal-pacha /tmp/imranbuild/node ./out/Debug/node test
imrani@dal-pacha /tmp/imranbuild/node dump -Tv -Xany ./out/Debug/node | grep malloc
[58]    0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) malloc
[91]    0x00000000    undef      IMP     DS EXTref   libc.a(shr.o) __linux_malloc
[259]   0x00000000    undef      IMP     RW EXTref   libc.a(shr.o) __malloc_user_defined_name
[3528]  0x20064080    .data      EXP     RW   Ldef        [noIMid] ares_malloc
[70891] 0x2012e748    .data      EXP     DS   Ldef        [noIMid] uprv_malloc_57
[75819] 0x2013cc5c    .data      EXP     DS   Ldef        [noIMid] CRYPTO_malloc_locked
[75821] 0x2013cc74    .data      EXP     DS   Ldef        [noIMid] CRYPTO_malloc
[75826] 0x2013ccb0    .data      EXP     DS   Ldef        [noIMid] CRYPTO_remalloc
[75836] 0x2013cd28    .data      EXP     DS   Ldef        [noIMid] CRYPTO_dbg_malloc
[77489] 0x20141b1c    .data      EXP     DS   Ldef        [noIMid] ares_malloc_data
[77581] 0x20141f90    .data      EXP     DS   Ldef        [noIMid] uv__malloc

Stepping through code

Decided to step through code to see what was going on

With Inspector

Even though I break on malloc I never hit it RandomBytesRequest and the program fails as data_ is null

(gdb) break malloc
Breakpoint 2 at 0xd012a17c
(gdb) c
Continuing.
FATAL ERROR: node::RandomBytesRequest() Out of Memory

Program received signal SIGABRT, Aborted.
0xd050adf4 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5.o)
(gdb) where
#0  0xd050adf4 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5.o)
#1  0xd050a24c in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5.o)
#2  0xd0137284 in raise () from /usr/lib/libc.a(shr.o)
#3  0xd01c0d68 in abort () from /usr/lib/libc.a(shr.o)
#4  0x10019e68 in node::OnFatalError (location=0x1233d0d8 "node::RandomBytesRequest()", message=0x1233d014 "Out of Memory") at ../src/node.cc:2395
#5  0x10019ecc in node::FatalError (location=0x1233d0d8 "node::RandomBytesRequest()", message=0x1233d014 "Out of Memory") at ../src/node.cc:2400
#6  0x10c09f6c in node::crypto::RandomBytesRequest::RandomBytesRequest (this=0x30a14298, env=0x309f9198, object=..., size=0) at ../src/node_crypto.cc:5380
#7  0x10c0ad24 in node::crypto::RandomBytes (args=0x2ff22044) at ../src/node_crypto.cc:5500
#8  0x115e97b8 in v8::internal::FunctionCallbackArguments::Call (this=0x2ff22178, f=<incomplete type>) at ../deps/v8/src/arguments.cc:33
#9  0x104274f4 in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (isolate=0x309cf228, args=...) at ../deps/v8/src/builtins.cc:3915
#10 0x10423db8 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0x309cf228) at ../deps/v8/src/builtins.cc:3939
#11 0x10423cc0 in v8::internal::Builtin_HandleApiCall (args_length=3, args_object=0x2ff22328, isolate=0x309cf228) at ../deps/v8/src/builtins.cc:3936
#12 0xef70c74c in ?? ()
Breakpoint 1, node::crypto::RandomBytesRequest::RandomBytesRequest (this=0x30a14298, env=0x309f9198, object=..., size=0) at ../src/node_crypto.cc:5378
5378    data_ = static_cast<char*>(malloc(size));
(gdb) print malloc
$1 = {<text variable, no debug info>} 0xd012a160 <malloc>
(gdb) s
5379        if (data() == nullptr)
(gdb) print datat_
^CNo symbol "datat_" in current context.
(gdb) print data_
$2 = (<invalid type code 8> *) 0x0
(gdb) print malloc
$3 = {<text variable, no debug info>} 0xd012a160 <malloc>
(gdb) 

Without Inspector

My breakpoint on malloc succeeds and the program functions without a hitch

Breakpoint 2, 0xd012a17c in malloc () from /usr/lib/libc.a(shr.o)
(gdb) where
#0  0xd012a17c in malloc () from /usr/lib/libc.a(shr.o)
#1  0xd031d9dc in __linux_malloc () from /usr/lib/libc.a(shr.o)
#2  0x10e13b6c in node::crypto::RandomBytesRequest::RandomBytesRequest (this=0x309e1c18, env=0x309c6118, object=..., size=0) at ../src/node_crypto.cc:5377
#3  0x10e14968 in node::crypto::RandomBytes (args=0x2ff22064) at ../src/node_crypto.cc:5499
#4  0x115ed608 in v8::internal::FunctionCallbackArguments::Call (this=0x2ff22198, f=<incomplete type>) at ../deps/v8/src/arguments.cc:33
#5  0x10248d6c in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (isolate=0x3099c118, args=...) at ../deps/v8/src/builtins.cc:3915
#6  0x10245630 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0x3099c118) at ../deps/v8/src/builtins.cc:3939
#7  0x10245538 in v8::internal::Builtin_HandleApiCall (args_length=3, args_object=0x2ff22348, isolate=0x3099c118) at ../deps/v8/src/builtins.cc:3936
#8  0xef70c74c in ?? ()
Breakpoint 1, node::crypto::RandomBytesRequest::RandomBytesRequest (this=0x309e0d68, env=0x309c60b8, object=..., size=0) at ../src/node_crypto.cc:5378
5378    data_ = static_cast<char*>(malloc(size));
(gdb) s
5379        if (data() == nullptr)
(gdb) print data_
$1 = (<invalid type code 8> *) 0x309db7d8 ""
(gdb) 

Note we have a valid pointer in this case

Disassembly of RandomBytesRequest

I decided to check the assembly to see what was going on in both cases we are branching at +108 offset.

With Inspector

Breakpoint 1, node::crypto::RandomBytesRequest::RandomBytesRequest (this=0x30a14298, env=0x309f9198, object=..., size=0) at ../src/node_crypto.cc:5378
5378    data_ = static_cast<char*>(malloc(size));
(gdb) disass $pc
Dump of assembler code for function _ZN4node6crypto18RandomBytesRequestC1EPNS_11EnvironmentEN2v85LocalINS4_6ObjectEEEm:
   0x10c09eb4 <+0>:     mflr    r0
   0x10c09eb8 <+4>:     stw     r0,8(r1)
   0x10c09ebc <+8>:     stw     r31,-4(r1)
   0x10c09ec0 <+12>:    stwu    r1,-64(r1)
   0x10c09ec4 <+16>:    mr      r31,r1
   0x10c09ec8 <+20>:    stw     r3,88(r31)
   0x10c09ecc <+24>:    stw     r4,92(r31)
   0x10c09ed0 <+28>:    stw     r5,96(r31)
   0x10c09ed4 <+32>:    stw     r6,100(r31)
   0x10c09ed8 <+36>:    lwz     r9,88(r31)
   0x10c09edc <+40>:    mr      r3,r9
   0x10c09ee0 <+44>:    lwz     r4,92(r31)
   0x10c09ee4 <+48>:    lwz     r5,96(r31)
   0x10c09ee8 <+52>:    li      r6,1
   0x10c09eec <+56>:    li      r7,0
   0x10c09ef0 <+60>:    bl      0x10054a48 <_ZN4node9AsyncWrapC2EPNS_11EnvironmentEN2v85LocalINS3_6ObjectEEENS0_12ProviderTypeEPS0_>
   0x10c09ef4 <+64>:    nop
   0x10c09ef8 <+68>:    lwz     r9,88(r31)
   0x10c09efc <+72>:    b       0x10c2810c <@FIX30+5764>
   0x10c09f00 <+76>:    stw     r10,0(r9)
   0x10c09f04 <+80>:    lwz     r9,88(r31)
   0x10c09f08 <+84>:    li      r10,0
   0x10c09f0c <+88>:    stw     r10,88(r9)
   0x10c09f10 <+92>:    lwz     r9,88(r31)
   0x10c09f14 <+96>:    lwz     r10,100(r31)
   0x10c09f18 <+100>:   stw     r10,92(r9)
=> 0x10c09f1c <+104>:   lwz     r3,100(r31)
   0x10c09f20 <+108>:   bl      0x1006fa84
   0x10c09f24 <+112>:   lwz     r2,20(r1)
   0x10c09f28 <+116>:   mr      r9,r3

(gdb) disass 0x1006fa84
Dump of assembler code for function malloc:
   0x1006fa84:  lwz     r12,636(r2)
   0x1006fa88:  stw     r2,20(r1)
   0x1006fa8c:  lwz     r0,0(r12)
   0x1006fa90:  lwz     r2,4(r12)
   0x1006fa94:  mtctr   r0
   0x1006fa98:  bctr
   0x1006fa9c:  .long 0x0
   0x1006faa0:  .long 0xca000
   0x1006faa4:  .long 0x0
   0x1006faa8:  .long 0x18
End of assembler dump.
(gdb) 

Without Inspector

Breakpoint 1, node::crypto::RandomBytesRequest::RandomBytesRequest (this=0x30a14298, env=0x309f9198, object=..., size=0) at ../src/node_crypto.cc:5378
5378    data_ = static_cast<char*>(malloc(size));
(gdb) disass $pc
Dump of assembler code for function _ZN4node6crypto18RandomBytesRequestC1EPNS_11EnvironmentEN2v85LocalINS4_6ObjectEEEm:
   0x10e13afc <+0>:     mflr    r0
   0x10e13b00 <+4>:     stw     r0,8(r1)
   0x10e13b04 <+8>:     stw     r31,-4(r1)
   0x10e13b08 <+12>:    stwu    r1,-64(r1)
   0x10e13b0c <+16>:    mr      r31,r1
   0x10e13b10 <+20>:    stw     r3,88(r31)
   0x10e13b14 <+24>:    stw     r4,92(r31)
   0x10e13b18 <+28>:    stw     r5,96(r31)
   0x10e13b1c <+32>:    stw     r6,100(r31)
   0x10e13b20 <+36>:    lwz     r9,88(r31)
   0x10e13b24 <+40>:    mr      r3,r9
   0x10e13b28 <+44>:    lwz     r4,92(r31)
   0x10e13b2c <+48>:    lwz     r5,96(r31)
   0x10e13b30 <+52>:    li      r6,1
   0x10e13b34 <+56>:    li      r7,0
   0x10e13b38 <+60>:    bl      0x100542fc <_ZN4node9AsyncWrapC2EPNS_11EnvironmentEN2v85LocalINS3_6ObjectEEENS0_12ProviderTypeEPS0_>
   0x10e13b3c <+64>:    nop
   0x10e13b40 <+68>:    lwz     r9,88(r31)
   0x10e13b44 <+72>:    b       0x10e32074 <@FIX31+6580>
   0x10e13b48 <+76>:    stw     r10,0(r9)
   0x10e13b4c <+80>:    lwz     r9,88(r31)
   0x10e13b50 <+84>:    li      r10,0
   0x10e13b54 <+88>:    stw     r10,88(r9)
   0x10e13b58 <+92>:    lwz     r9,88(r31)
   0x10e13b5c <+96>:    lwz     r10,100(r31)
   0x10e13b60 <+100>:   stw     r10,92(r9)
=> 0x10e13b64 <+104>:   lwz     r3,100(r31)
   0x10e13b68 <+108>:   bl      0x107e476c
   0x10e13b6c <+112>:   lwz     r2,20(r1)
   0x10e13b70 <+116>:   mr      r10,r3

(gdb) disass 0x107e476c
Dump of assembler code for function __linux_malloc:
   0x107e476c:  lwz     r12,-9232(r2)
   0x107e4770:  stw     r2,20(r1)
   0x107e4774:  lwz     r0,0(r12)
   0x107e4778:  lwz     r2,4(r12)
   0x107e477c:  mtctr   r0
   0x107e4780:  bctr
   0x107e4784:  .long 0x0
   0x107e4788:  .long 0xca000
   0x107e478c:  .long 0x0
   0x107e4790:  .long 0x18
End of assembler dump.
(gdb) 

Conclusion

So for some reason when we are compiling and linking against the inspector we end calling the "bad" malloc malloc instead of the "good" malloc __linux_malloc. I'm not too sure why this is happening as the only thing I am doing is

./configure
gmake -C out BUILDTYPE=Debug
//gdb stuff here
./configure --without-inspector
gmake -C out BUILDTYPE=Debug
//repeat gdb stuff here

Not even running gmake clean inbetween. In both cases we have a reference to __linux_malloc in our binary, yet when we have the inspector we are not calling it Still need to figure out why this is the case. Though I have definitely proven that with the inspector __linux_malloc is never being called

imran-iq commented 8 years ago

So here's an objdump of node_crypto.o compile with and without the inspector

Without Inspector

imrani@dal-pacha /tmp/imranbuild/node objdump -t ./out/Debug/obj.target/node_base/src/node_crypto.o | grep malloc
[732](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .__linux_malloc
[2699](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 uv_malloc_func:t1387=1388=*1389=f86
[3617](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 dyn_MEM_malloc_cb:t2519=1388
[3620](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 st_dynamic_MEM_fns:T2522=s12malloc_cb:2519,0,32;realloc_cb:2520,32,32;free_cb:2521,64,32;;
imrani@dal-pacha /tmp/imranbuild/node objdump -r  ./out/Debug/obj.target/node_base/src/node_crypto.o | grep malloc
00015608 R_RBR_26          .__linux_malloc
0001dd90 R_RBR_26          .__linux_malloc
0001e054 R_RBR_26          .__linux_malloc
0001e27c R_RBR_26          .__linux_malloc
0001ed14 R_RBR_26          .__linux_malloc
0001fe88 R_RBR_26          .__linux_malloc
0001ff3c R_RBR_26          .__linux_malloc
00020688 R_RBR_26          .__linux_malloc
00021e00 R_RBR_26          .__linux_malloc
000270fc R_RBR_26          .__linux_malloc
0002d0e8 R_RBR_26          .__linux_malloc
0003aeb0 R_RBR_26          .__linux_malloc

With Inspector

imrani@dal-pacha /tmp/imranbuild/node objdump -t ./out/Debug/obj.target/node_base/src/node_crypto.o | grep malloc
[738](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .malloc
[950](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .__linux_malloc
[2707](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 uv_malloc_func:t1387=1388=*1389=f86
[4180](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 dyn_MEM_malloc_cb:t2889=1388
[4183](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 st_dynamic_MEM_fns:T2892=s12malloc_cb:2889,0,32;realloc_cb:2890,32,32;free_cb:2891,64,32;;
imrani@dal-pacha /tmp/imranbuild/node objdump -r  ./out/Debug/obj.target/node_base/src/node_crypto.o | grep malloc
000156f0 R_RBR_26          .malloc
0001de7c R_RBR_26          .malloc
0001e144 R_RBR_26          .malloc
0001e370 R_RBR_26          .malloc
0001ee0c R_RBR_26          .malloc
0001ff84 R_RBR_26          .malloc
0002003c R_RBR_26          .malloc
0002078c R_RBR_26          .malloc
00021f08 R_RBR_26          .malloc
00027208 R_RBR_26          .__linux_malloc
0002d1f4 R_RBR_26          .malloc
0003afc0 R_RBR_26          .malloc
imran-iq commented 8 years ago

So the issue seems to be the -DHAVE_INSPECTOR flag. The only difference between these two compiles is in one we have -DHAVE_INSPECTOR=1 and the other -DHAVE_INSPECTOR=0

Set to 1

imrani@dal-pacha /tmp/imranbuild/node/out /usr/bin/g++ '-DNODE_ARCH="ppc"' '-DNODE_PLATFORM="aix"' '-DNODE_WANT_INTERNALS=1' '-DV8_DEPRECATION_WARNINGS=1' '-DNODE_HAVE_I18N_SUPPORT=1' '-DNODE_HAVE_SMALL_ICU=1' '-DHAVE_INSPECTOR=1' '-DHAVE_OPENSSL=1' '-D__POSIX__' '-D_LINUX_SOURCE_COMPAT=1' '-D_ALL_SOURCE=1' '-DUCONFIG_NO_TRANSLITERATION=1' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_CONVERSION=1' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' '-DDEBUG' '-D_DEBUG' -I../src -I../tools/msvs/genfiles -I../deps/uv/src/ares -I/tmp/imranbuild/node/out/Debug/obj/gen -I../deps/v8 -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/openssl/openssl/include -I../deps/zlib -I../deps/http_parser -I../deps/cares/include -I../deps/uv/include  -pthread -Wall -Wextra -Wno-unused-parameter -g -O0 -gxcoff -fno-rtti -fno-exceptions -std=gnu++0x -MMD -MF /tmp/imranbuild/node/out/Debug/.deps//tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o.d.raw   -c -o /tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o ../src/node_crypto.cc
imrani@dal-pacha /tmp/imranbuild/node/out objdump -tr ./Debug/obj.target/node_base/src/node_crypto.o | grep malloc
[738](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .malloc
[950](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .__linux_malloc
[2707](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 uv_malloc_func:t1387=1388=*1389=f86
[4180](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 dyn_MEM_malloc_cb:t2889=1388
[4183](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 st_dynamic_MEM_fns:T2892=s12malloc_cb:2889,0,32;realloc_cb:2890,32,32;free_cb:2891,64,32;;
000156f0 R_RBR_26          .malloc
0001de7c R_RBR_26          .malloc
0001e144 R_RBR_26          .malloc
0001e370 R_RBR_26          .malloc
0001ee0c R_RBR_26          .malloc
0001ff84 R_RBR_26          .malloc
0002003c R_RBR_26          .malloc
0002078c R_RBR_26          .malloc
00021f08 R_RBR_26          .malloc
00027208 R_RBR_26          .__linux_malloc
0002d1f4 R_RBR_26          .malloc
0003afc0 R_RBR_26          .malloc

Set to 0

imrani@dal-pacha /tmp/imranbuild/node/out /usr/bin/g++ '-DNODE_ARCH="ppc"' '-DNODE_PLATFORM="aix"' '-DNODE_WANT_INTERNALS=1' '-DV8_DEPRECATION_WARNINGS=1' '-DNODE_HAVE_I18N_SUPPORT=1' '-DNODE_HAVE_SMALL_ICU=1' '-DHAVE_INSPECTOR=0' '-DHAVE_OPENSSL=1' '-D__POSIX__' '-D_LINUX_SOURCE_COMPAT=1' '-D_ALL_SOURCE=1' '-DUCONFIG_NO_TRANSLITERATION=1' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_CONVERSION=1' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' '-DDEBUG' '-D_DEBUG' -I../src -I../tools/msvs/genfiles -I../deps/uv/src/ares -I/tmp/imranbuild/node/out/Debug/obj/gen -I../deps/v8 -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/openssl/openssl/include -I../deps/zlib -I../deps/http_parser -I../deps/cares/include -I../deps/uv/include  -pthread -Wall -Wextra -Wno-unused-parameter -g -O0 -gxcoff -fno-rtti -fno-exceptions -std=gnu++0x -MMD -MF /tmp/imranbuild/node/out/Debug/.deps//tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o.d.raw   -c -o /tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o ../src/node_crypto.cc
imrani@dal-pacha /tmp/imranbuild/node/out objdump -tr ./Debug/obj.target/node_base/src/node_crypto.o | grep malloc[732](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .__linux_malloc
[2699](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 uv_malloc_func:t1387=1388=*1389=f86
[3617](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 dyn_MEM_malloc_cb:t2519=1388
[3620](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 st_dynamic_MEM_fns:T2522=s12malloc_cb:2519,0,32;realloc_cb:2520,32,32;free_cb:2521,64,32;;
00015608 R_RBR_26          .__linux_malloc
0001dd90 R_RBR_26          .__linux_malloc
0001e054 R_RBR_26          .__linux_malloc
0001e27c R_RBR_26          .__linux_malloc
0001ed14 R_RBR_26          .__linux_malloc
0001fe88 R_RBR_26          .__linux_malloc
0001ff3c R_RBR_26          .__linux_malloc
00020688 R_RBR_26          .__linux_malloc
00021e00 R_RBR_26          .__linux_malloc
000270fc R_RBR_26          .__linux_malloc
0002d0e8 R_RBR_26          .__linux_malloc
0003aeb0 R_RBR_26          .__linux_malloc
imran-iq commented 8 years ago

Removing the following lines and compiling again with -DHAVE_INSPECTOR=1 causes the problem to disappear

Changes

diff --git a/src/env-inl.h b/src/env-inl.h
index 97e1ba8..34f9bf7 100644
--- a/src/env-inl.h
+++ b/src/env-inl.h
@@ -225,9 +225,6 @@ inline Environment::Environment(v8::Local<v8::Context> context,
       makecallback_cntr_(0),
       async_wrap_uid_(0),
       debugger_agent_(this),
-#if HAVE_INSPECTOR
-      inspector_agent_(this),
-#endif
       http_parser_buffer_(nullptr),
       context_(context->GetIsolate(), context) {
   // We'll be creating new objects so make sure we've entered the context.
diff --git a/src/env.h b/src/env.h
index 4c310c8..b2f0b27 100644
--- a/src/env.h
+++ b/src/env.h
@@ -5,9 +5,6 @@

 #include "ares.h"
 #include "debug-agent.h"
-#if HAVE_INSPECTOR
-#include "inspector_agent.h"
-#endif
 #include "handle_wrap.h"
 #include "req-wrap.h"
 #include "tree.h"
@@ -552,11 +549,6 @@ class Environment {
     return &debugger_agent_;
   }

-#if HAVE_INSPECTOR
-  inline inspector::Agent* inspector_agent() {
-    return &inspector_agent_;
-  }
-#endif

   typedef ListHead<HandleWrap, &HandleWrap::handle_wrap_queue_> HandleWrapQueue;
   typedef ListHead<ReqWrap<uv_req_t>, &ReqWrap<uv_req_t>::req_wrap_queue_>
@@ -595,9 +587,6 @@ class Environment {
   size_t makecallback_cntr_;
   int64_t async_wrap_uid_;
   debugger::Agent debugger_agent_;
-#if HAVE_INSPECTOR
-  inspector::Agent inspector_agent_;
-#endif

   HandleWrapQueue handle_wrap_queue_;
   ReqWrapQueue req_wrap_queue_;

Result

imrani@dal-pacha /tmp/imranbuild/node/out vi ../src/env.h 
imrani@dal-pacha /tmp/imranbuild/node/out /usr/bin/g++ '-DNODE_ARCH="ppc"' '-DNODE_PLATFORM="aix"' '-DNODE_WANT_INTERNALS=1' '-DV8_DEPRECATION_WARNINGS=1' '-DNODE_HAVE_I18N_SUPPORT=1' '-DNODE_HAVE_SMALL_ICU=1' '-DHAVE_INSPECTOR=1' '-DHAVE_OPENSSL=1' '-D__POSIX__' '-D_LINUX_SOURCE_COMPAT=1' '-D_ALL_SOURCE=1' '-DUCONFIG_NO_TRANSLITERATION=1' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_CONVERSION=1' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' '-DDEBUG' '-D_DEBUG' -I../src -I../tools/msvs/genfiles -I../deps/uv/src/ares -I/tmp/imranbuild/node/out/Debug/obj/gen -I../deps/v8 -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/openssl/openssl/include -I../deps/zlib -I../deps/http_parser -I../deps/cares/include -I../deps/uv/include  -pthread -Wall -Wextra -Wno-unused-parameter -g -O0 -gxcoff -fno-rtti -fno-exceptions -std=gnu++0x -MMD -MF /tmp/imranbuild/node/out/Debug/.deps//tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o.d.raw   -c -o /tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o ../src/node_crypto.cc
In file included from ../src/base-object-inl.h:8:0,
                 from ../src/async-wrap-inl.h:8,
                 from ../src/node_crypto.h:14,
                 from ../src/node_crypto.cc:3:
../src/env-inl.h: In constructor 'node::Environment::Environment(v8::Local<v8::Context>, uv_loop_t*)':
../src/env-inl.h:229:7: error: class 'node::Environment' does not have any field named 'inspector_agent_'
       inspector_agent_(this),
       ^
imrani@dal-pacha /tmp/imranbuild/node/out vi ../src/env-inl.h
imrani@dal-pacha /tmp/imranbuild/node/out /usr/bin/g++ '-DNODE_ARCH="ppc"' '-DNODE_PLATFORM="aix"' '-DNODE_WANT_INTERNALS=1' '-DV8_DEPRECATION_WARNINGS=1' '-DNODE_HAVE_I18N_SUPPORT=1' '-DNODE_HAVE_SMALL_ICU=1' '-DHAVE_INSPECTOR=1' '-DHAVE_OPENSSL=1' '-D__POSIX__' '-D_LINUX_SOURCE_COMPAT=1' '-D_ALL_SOURCE=1' '-DUCONFIG_NO_TRANSLITERATION=1' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_CONVERSION=1' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' '-DDEBUG' '-D_DEBUG' -I../src -I../tools/msvs/genfiles -I../deps/uv/src/ares -I/tmp/imranbuild/node/out/Debug/obj/gen -I../deps/v8 -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/openssl/openssl/include -I../deps/zlib -I../deps/http_parser -I../deps/cares/include -I../deps/uv/include  -pthread -Wall -Wextra -Wno-unused-parameter -g -O0 -gxcoff -fno-rtti -fno-exceptions -std=gnu++0x -MMD -MF /tmp/imranbuild/node/out/Debug/.deps//tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o.d.raw   -c -o /tmp/imranbuild/node/out/Debug/obj.target/node_base/src/node_crypto.o ../src/node_crypto.cc
imrani@dal-pacha /tmp/imranbuild/node/out objdump -tr ./Debug/obj.target/node_base/src/node_crypto.o | grep malloc
[732](sec  0)(fl 0x00)(ty   0)(scl   2) (nx 1) 0x00000000 .__linux_malloc
[2699](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 uv_malloc_func:t1387=1388=*1389=f86
[3617](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 dyn_MEM_malloc_cb:t2519=1388
[3620](sec -2)(fl 0x00)(ty   0)(scl 140) (nx 0) 0x00000000 st_dynamic_MEM_fns:T2522=s12malloc_cb:2519,0,32;realloc_cb:2520,32,32;free_cb:2521,64,32;;
00015608 R_RBR_26          .__linux_malloc
0001dd90 R_RBR_26          .__linux_malloc
0001e054 R_RBR_26          .__linux_malloc
0001e27c R_RBR_26          .__linux_malloc
0001ed14 R_RBR_26          .__linux_malloc
0001fe88 R_RBR_26          .__linux_malloc
0001ff3c R_RBR_26          .__linux_malloc
00020688 R_RBR_26          .__linux_malloc
00021e00 R_RBR_26          .__linux_malloc
000270fc R_RBR_26          .__linux_malloc
0002d0e8 R_RBR_26          .__linux_malloc
0003aeb0 R_RBR_26          .__linux_malloc

So having an inspector agent causes us to use the wrong malloc

eugeneo commented 8 years ago

I think this may have to do with the inspector_agent.h.

That header uses std::vector. Could it be that it is STL that pulls in the wrong malloc?

imran-iq commented 8 years ago

Looks like you're right Taken from /opt/freeware/lib/gcc/powerpc-ibm-aix6.1.0.0/4.8.4/include/c++/cstlib

include <stdlib.h>

// Get rid of those macros defined in <stdlib.h> in lieu of real functions.
#undef abort
#undef abs
#undef atexit
#if __cplusplus >= 201103L
# ifdef _GLIBCXX_HAVE_AT_QUICK_EXIT
#  undef at_quick_exit
# endif
#endif
#undef atof
#undef atoi
#undef atol
#undef bsearch
#undef calloc
#undef div
#undef exit
#undef free
#undef getenv
#undef labs
#undef ldiv
#undef malloc
#undef mblen
#undef mbstowcs
#undef mbtowc
#undef qsort
eugeneo commented 8 years ago

Ok, I'll do a change tomorrow that removes the vector from the header.

On Thu, Jun 2, 2016 at 1:29 PM Imran Iqbal notifications@github.com wrote:

Looks like you're right Taken from /opt/freeware/lib/gcc/powerpc-ibm-aix6.1.0.0/4.8.4/include/c++/cstlib

include

// Get rid of those macros defined in in lieu of real functions.

undef abort

undef abs

undef atexit

if __cplusplus >= 201103L

ifdef _GLIBCXX_HAVE_AT_QUICK_EXIT

undef at_quick_exit

endif

endif

undef atof

undef atoi

undef atol

undef bsearch

undef calloc

undef div

undef exit

undef free

undef getenv

undef labs

undef ldiv

undef malloc

undef mblen

undef mbstowcs

undef mbtowc

undef qsort

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/7080#issuecomment-223412135, or mute the thread https://github.com/notifications/unsubscribe/AARkrZqMRtqbWBT1lUYSxQefp5aDVfrwks5qHz0_gaJpZM4IqzAe .

ofrobots commented 8 years ago

I am a little bit confused, there are a lot of other files that also #include <vector>, e.g. node.cc, node_file.cc, and code in V8. Why is this a problem only for inspector_agent.h?

eugeneo commented 8 years ago

I think it might have to do with the inspector_socket.h being a header included elsewhere.

On Thu, Jun 2, 2016 at 1:42 PM Ali Ijaz Sheikh notifications@github.com wrote:

I am a little bit confused, there are a lot of other files that also #include

, e.g. node.cc, node_file.cc, and code in V8. Why is this a problem only for inspector_agent.h? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/7080#issuecomment-223415622, or mute the thread https://github.com/notifications/unsubscribe/AARkrTg_HYJmDwB4cl-H1kdwGFmSzJjoks5qH0A7gaJpZM4IqzAe .
imran-iq commented 8 years ago

So this raises a few questions:

eugeneo commented 8 years ago

Can you try this - https://github.com/eugeneo/node/commit/c367115f4b0a3b42e233a899c4f0469feebf908b

This just a proof of concept, actual implementation will be cleaner.

On Thu, Jun 2, 2016 at 3:08 PM Imran Iqbal notifications@github.com wrote:

So this raises a few questions:

  • Uses of different mallocs have to have different free calls, in the failing case we see that we have malloc interspersed with __linux_malloc. This could be problematic in the future?
  • The tests themselves use crypto.randomBytes because it has an async callback, but they pass 0 as the number of bytes they want. Is this a valid use of the api?
  • There exists a notion of glibc compatible malloc for example autoconf includes a test for this compatibililty (source) https://github.com/coreutils/gnulib/blob/master/m4/malloc.m4. The test checks exactly the behaviour we see in this case, ie. checking for null on malloc(0). Currently we assume that all mallocs behave the same and do not guard against this condition. Is this something we should check for in the configure step or guard against in code?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nodejs/node/issues/7080#issuecomment-223437180, or mute the thread https://github.com/notifications/unsubscribe/AARkrbxkAAsKI27eTmUU9XqQAVxRkOeFks5qH1RJgaJpZM4IqzAe .

imran-iq commented 8 years ago

https://ci.nodejs.org/job/node-test-commit-aix/205/ Hopefully i set the parameters properly there

imran-iq commented 8 years ago

@eugeneo that changeset appears to have fixed the issue

imran-iq commented 8 years ago

Testing out the following changeset to see if it sovles the issue: https://github.com/iWuzHere/node/commit/5966f16057bd27d381cc1681dc963033d57dc0c5

https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/207/

imran-iq commented 8 years ago

All green from https://github.com/iWuzHere/node/commit/a35a8320491bf1fe60dc932546127c526cd55aa8 and https://github.com/iWuzHere/node/commit/42e12bfcc9b7bd23f37fe37e8fe8d0d29e63e00e https://ci.nodejs.org/job/node-test-commit-aix/nodes=aix61-ppc64/210/

mhdawson commented 8 years ago

Based on the investigation (myself and iWuzHere) our understanding is that the core issue is a problem with how _LINUX_SOURCE_COMPAT was implemented for AIX. We've talked to the team that handles gcc for AIX and it sounds like they will update the toolchain so that it defines malloc (and variants) in a better way that won't conflict with what the STL header was doing. This is however, a longer term solution as it will take a while for this to make it into the toolchain and us to upgrade to later levels of the compiler.

In the shorter term it looks like we have 2 options. One is to refactor the inspector code to avoid the problem as @eugeneo has been working on. The second is to update the use of malloc (and variants) in Node as @iWuzHere has done in https://github.com/iWuzHere/node/commit/42e12bfcc9b7bd23f37fe37e8fe8d0d29e63e00e. I think I'd favor the second option if we can get it accepted since it would cover other similar issues if they arise before the toolchain is updated. @eugeneo and @ofrobots whats your take ?

eugeneo commented 8 years ago

I think I'd like to refactor the inspector in either case, even if not for AIX issue.

mhdawson commented 8 years ago

Ok so sounds like both would likely make sense. @iWuzHere can you submit the PR referencing this issue. I'd include the information in the pull request that the longer term change will be in the toolchain but this will make sure we don't see other similar issues up until the toolchain is updated.

imran-iq commented 8 years ago

Created pull request #7229