Closed siddarthkay closed 5 months ago
To investigate I tried adding more debug flags to the node process that calls contract tests like this :
diff --git a/Makefile b/Makefile
index b2ad755f6..6b5b62d32 100644
--- a/Makefile
+++ b/Makefile
@@ -345,7 +345,7 @@ else
yarn node-pre-gyp rebuild && \
yarn shadow-cljs compile mocks && \
yarn shadow-cljs compile test && \
- node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
+ node --trace-warnings --trace-uncaught --inspect-brk --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
endif
test: export SHADOW_OUTPUT_TO := target/test/test.js
and then I open chrome://inspect
and set check these 2 values :
pause on uncaught exceptions
pause on caught exceptions
just to get a feel of whats going on.
right away I see something suspicious :
"Cannot find module 'bufferutil'
Require stack:
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/buffer-util.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/permessage-deflate.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/websocket.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/index.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/jsonrpc-ws-connection/dist/index.cjs.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/core/dist/index.cjs.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/target/contract_test/test.js"
and
"Cannot find module 'utf-8-validate'
Require stack:
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/validation.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/receiver.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/websocket.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/index.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/jsonrpc-ws-connection/dist/index.cjs.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/core/dist/index.cjs.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/target/contract_test/test.js"
To validate whether this error is related to our missing symbols problem I nuked all of @walletconnect dependencies, usage and its reference in tests and this error message disappeared but the make test-contract
was still failing the same way.
This proves that the error messages related to 'bufferutil' and 'utf-8-validate' are not causing the failures we are interested in. It would also make sense for this not to be the culprit since what we are looking for has to be something specific to MacOS
.
Next up I decided to try and find the backtrace of the abort signal. I added a signal tracing function like this in modules/react-native-status/nodejs/status.cpp
diff --git a/modules/react-native-status/nodejs/status.cpp b/modules/react-native-status/nodejs/status.cpp
index 620d77032..6ae37f637 100644
--- a/modules/react-native-status/nodejs/status.cpp
+++ b/modules/react-native-status/nodejs/status.cpp
@@ -8,6 +8,7 @@
#include <node.h>
#include <string>
#include <queue>
+#include <execinfo.h>
#include "../../../result/libstatus.h"
@@ -27,6 +28,25 @@ using v8::Number;
using v8::Value;
+void SignalHandler(int signal) {
+ printf("Caught signal: %d\n", signal);
+
+ // Print the backtrace
+ void *backtraceBuffer[8];
+ int numFrames = backtrace(backtraceBuffer, 8);
+ char **symbolList = backtrace_symbols(backtraceBuffer, numFrames);
+
+ printf("Backtrace:\n");
+ for (int i = 0; i < numFrames; ++i) {
+ printf("%s\n", symbolList[i]);
+ }
+
+ free(symbolList);
+
+ std::abort(); // Terminate the program
+}
+
+
void _MultiAccountGenerateAndDeriveAddresses(const FunctionCallbackInfo<Value>& args) {
Isolate* isolate = args.GetIsolate();
Local<Context> context = isolate->GetCurrentContext();
@@ -1925,6 +1945,8 @@ void _RestoreAccountAndLogin(const FunctionCallbackInfo<Value>& args) {
void init(Local<Object> exports) {
+ signal(SIGABRT, SignalHandler);
+ signal(SIGSEGV, SignalHandler);
NODE_SET_METHOD(exports, "multiAccountGenerateAndDeriveAddresses", _MultiAccountGenerateAndDeriveAddresses);
NODE_SET_METHOD(exports, "multiAccountImportPrivateKey", _MultiAccountImportPrivateKey);
NODE_SET_METHOD(exports, "multiAccountLoadAccount", _MultiAccountLoadAccount);
And then when the abort trap is triggered by node process we are able to latch on to that signal in the cpp code and print the backtrace. Here is what I see :
dyld[51565]: missing symbol called
Caught signal: 6
Backtrace:
0 status_nodejs_addon.node 0x0000000160001dd0 _ZN6status13SignalHandlerEi + 48
1 libsystem_platform.dylib 0x00000001828a7584 _sigtramp + 56
2 dyld 0x0000000182560628 abort_with_payload_wrapper_internal + 104
3 dyld 0x000000018256065c dyld + 493148
4 dyld 0x00000001824f26b0 _ZN5dyld411CacheFinderD2Ev + 0
5 dyld 0x0000000182526b98 _ZN5dyld44APIs11_tlv_atexitEPFvPvES1_ + 0
6 status_nodejs_addon.node 0x000000016007853c runtime.syscall.abi0 + 44
7 status_nodejs_addon.node 0x0000000160076c7c runtime.asmcgocall.abi0 + 124
Caught signal: 6
Backtrace:
0 status_nodejs_addon.node 0x0000000160001dd0 _ZN6status13SignalHandlerEi + 48
1 libsystem_platform.dylib 0x00000001828a7584 _sigtramp + 56
2 libsystem_pthread.dylib 0x0000000182876c20 pthread_kill + 288
3 libsystem_c.dylib 0x0000000182783a20 abort + 180
4 status_nodejs_addon.node 0x0000000160001e1c _ZN6status39_MultiAccountGenerateAndDeriveAddressesERKN2v820FunctionCallbackInfoINS0_5ValueEEE + 0
5 libsystem_platform.dylib 0x00000001828a7584 _sigtramp + 56
6 dyld 0x0000000182560628 abort_with_payload_wrapper_internal + 104
7 dyld 0x000000018256065c dyld + 493148
.
.
.
these env vars help in adding more metadata to dyld
calls.
git diff Makefile
diff --git a/Makefile b/Makefile
index b2ad755f6..0eee25cf2 100644
--- a/Makefile
+++ b/Makefile
@@ -345,7 +345,7 @@ else
yarn node-pre-gyp rebuild && \
yarn shadow-cljs compile mocks && \
yarn shadow-cljs compile test && \
- node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
+ DYLD_PRINT_APIS=1 DYLD_PRINT_BINDINGS=1 node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
endif
test: export SHADOW_OUTPUT_TO := target/test/test.js
I guess one viable route here would be to build status-go-library with go 1.19 and compare the binary file with building status-go-library from 1.20
I wouldnt know what to compare between the two libraries but I can try comparing results of
nm -u libstatus.a
since that should give me an idea of the -u flag is supposed to display undefined symbols.
I first found a tag in status-go
where the go-waku
upgrade had not happened and I picked https://github.com/status-im/status-go/releases/tag/v0.179.12
just to be safe.
I then built status-go library like this :
make status-go-library
I stored the output from reading symbols of the compiled libstatus.a
like this :
nm -u result/libstatus.a > go-119-symbols.log
I then removed the go 1.19 hack fix which would ensure we now use go 1.20
diff --git a/nix/status-go/library/default.nix b/nix/status-go/library/default.nix
index 3a49a98a78d..831fbfd837a 100644
--- a/nix/status-go/library/default.nix
+++ b/nix/status-go/library/default.nix
@@ -1,9 +1,6 @@
-{ stdenv, meta, source, buildGo119Package, buildGo120Package }:
-let
- # https://github.com/status-im/status-mobile/issues/19802
- # only for Darwin to fix Integration Tests failing with missing symbols on go 1.20
- buildGoPackageIntegrationTest = if stdenv.isDarwin then buildGo119Package else buildGo120Package;
-in buildGoPackageIntegrationTest {
+{ stdenv, meta, source, buildGoPackage }:
+
+buildGoPackage {
pname = source.repo;
version = "${source.cleanVersion}-${source.shortRev}";
and then I then cleaned up the status-go library with make clean
and built status-go library again with make status-go-library
. This would ensure that result folder would now contain binary built with go 1.20.
I then create a new log file to compare symbols like this :
nm -u result/libstatus.a > go-120-symbols.log
And indeed there were differences which is what I would expect since the obscure error message is of a missing symbol.
However I found a familiar symbol which was missing.
And indeed this problem was introduced in go 1.20 and happens only on Darwin related issues in go repo
The suggestion was to add -lresolv
flag along with netgo
which indeed made the crash go away.
Here is how I added those flags
diff --git a/nix/status-go/library/default.nix b/nix/status-go/library/default.nix
index 831fbfd837a..da3bec1e11f 100644
--- a/nix/status-go/library/default.nix
+++ b/nix/status-go/library/default.nix
@@ -22,11 +22,14 @@ buildGoPackage {
'';
# Build the Go library
+ # ld flags and netgo tag are necessary for integration tests to work on MacOS
+ # https://github.com/status-im/status-mobile/issues/20135
buildPhase = ''
runHook preBuild
go build \
-buildmode='c-archive' \
- -tags='gowaku_skip_migrations gowaku_no_rln' \
+ -ldflags '-w -s -extldflags "-lresolv"' \
+ -tags='gowaku_skip_migrations gowaku_no_rln netgo' \
-o "$out/libstatus.a" \
$NIX_BUILD_TOP/main.go
runHook postBuild
Problem
make test-contract
on MacOS fails with :The problem initially showcased itself after the
golang
1.20
upgrade was merged. ref -> https://github.com/status-im/status-mobile/issues/19802To fix this problem quickly I had modified the
status-go-library
derivation to build withgoPackage
1.19
only forDarwin
ref -> https://github.com/status-im/status-mobile/pull/19965However a PR last week at
status-go
bumpedgo-waku
version and along with it a bunch of libraries that are incompatible withgolang
1.19, specificallyquic-go
ref -> https://github.com/status-im/status-go/pull/5150Here are possible solutions to the current problem:
note : further upgrading go to 1.21 or 1.22 may or may not fix this issue.