status-im / status-mobile

a free (libre) open source, mobile OS for Ethereum
https://status.app
Mozilla Public License 2.0
3.91k stars 987 forks source link

`golang` version `1.20` on `MacOS` ARM processors generates `status-go-library` with missing symbols #20135

Closed siddarthkay closed 5 months ago

siddarthkay commented 5 months ago

Problem

make test-contract on MacOS fails with :

dyld[30877]: missing symbol called

The problem initially showcased itself after the golang 1.20 upgrade was merged. ref -> https://github.com/status-im/status-mobile/issues/19802

To fix this problem quickly I had modified the status-go-library derivation to build with goPackage 1.19 only for Darwin ref -> https://github.com/status-im/status-mobile/pull/19965

However a PR last week at status-go bumped go-waku version and along with it a bunch of libraries that are incompatible with golang 1.19, specifically quic-go ref -> https://github.com/status-im/status-go/pull/5150

Here are possible solutions to the current problem:

note : further upgrading go to 1.21 or 1.22 may or may not fix this issue.

siddarthkay commented 5 months ago

To investigate I tried adding more debug flags to the node process that calls contract tests like this :

diff --git a/Makefile b/Makefile
index b2ad755f6..6b5b62d32 100644
--- a/Makefile
+++ b/Makefile
@@ -345,7 +345,7 @@ else
        yarn node-pre-gyp rebuild && \
        yarn shadow-cljs compile mocks && \
        yarn shadow-cljs compile test && \
-       node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
+       node --trace-warnings --trace-uncaught --inspect-brk --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
 endif

 test: export SHADOW_OUTPUT_TO := target/test/test.js

and then I open chrome://inspect and set check these 2 values :

just to get a feel of whats going on.

siddarthkay commented 5 months ago

right away I see something suspicious :

"Cannot find module 'bufferutil'
Require stack:
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/buffer-util.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/permessage-deflate.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/websocket.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/index.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/jsonrpc-ws-connection/dist/index.cjs.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/core/dist/index.cjs.js
/Users/siddarthkumar/code/status-im/PR/status-mobile/target/contract_test/test.js"

and

"Cannot find module 'utf-8-validate'
Require stack:
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/validation.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/receiver.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/lib/websocket.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/ws/index.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/jsonrpc-ws-connection/dist/index.cjs.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/node_modules/@walletconnect/core/dist/index.cjs.js
- /Users/siddarthkumar/code/status-im/PR/status-mobile/target/contract_test/test.js"
siddarthkay commented 5 months ago

To validate whether this error is related to our missing symbols problem I nuked all of @walletconnect dependencies, usage and its reference in tests and this error message disappeared but the make test-contract was still failing the same way.

This proves that the error messages related to 'bufferutil' and 'utf-8-validate' are not causing the failures we are interested in. It would also make sense for this not to be the culprit since what we are looking for has to be something specific to MacOS.

siddarthkay commented 5 months ago

Next up I decided to try and find the backtrace of the abort signal. I added a signal tracing function like this in modules/react-native-status/nodejs/status.cpp

diff --git a/modules/react-native-status/nodejs/status.cpp b/modules/react-native-status/nodejs/status.cpp
index 620d77032..6ae37f637 100644
--- a/modules/react-native-status/nodejs/status.cpp
+++ b/modules/react-native-status/nodejs/status.cpp
@@ -8,6 +8,7 @@
 #include <node.h>
 #include <string>
 #include <queue>
+#include <execinfo.h>

 #include "../../../result/libstatus.h"

@@ -27,6 +28,25 @@ using v8::Number;
 using v8::Value;

+void SignalHandler(int signal) {
+    printf("Caught signal: %d\n", signal);
+
+    // Print the backtrace
+    void *backtraceBuffer[8];
+    int numFrames = backtrace(backtraceBuffer, 8);
+    char **symbolList = backtrace_symbols(backtraceBuffer, numFrames);
+
+    printf("Backtrace:\n");
+    for (int i = 0; i < numFrames; ++i) {
+        printf("%s\n", symbolList[i]);
+    }
+
+    free(symbolList);
+
+    std::abort();  // Terminate the program
+}
+
+
 void _MultiAccountGenerateAndDeriveAddresses(const FunctionCallbackInfo<Value>& args) {
        Isolate* isolate = args.GetIsolate();
         Local<Context> context = isolate->GetCurrentContext();
@@ -1925,6 +1945,8 @@ void _RestoreAccountAndLogin(const FunctionCallbackInfo<Value>& args) {

 void init(Local<Object> exports) {
+       signal(SIGABRT, SignalHandler);
+       signal(SIGSEGV, SignalHandler);
        NODE_SET_METHOD(exports, "multiAccountGenerateAndDeriveAddresses", _MultiAccountGenerateAndDeriveAddresses);
        NODE_SET_METHOD(exports, "multiAccountImportPrivateKey", _MultiAccountImportPrivateKey);
        NODE_SET_METHOD(exports, "multiAccountLoadAccount", _MultiAccountLoadAccount);
siddarthkay commented 5 months ago

And then when the abort trap is triggered by node process we are able to latch on to that signal in the cpp code and print the backtrace. Here is what I see :

dyld[51565]: missing symbol called
Caught signal: 6
Backtrace:
0   status_nodejs_addon.node            0x0000000160001dd0 _ZN6status13SignalHandlerEi + 48
1   libsystem_platform.dylib            0x00000001828a7584 _sigtramp + 56
2   dyld                                0x0000000182560628 abort_with_payload_wrapper_internal + 104
3   dyld                                0x000000018256065c dyld + 493148
4   dyld                                0x00000001824f26b0 _ZN5dyld411CacheFinderD2Ev + 0
5   dyld                                0x0000000182526b98 _ZN5dyld44APIs11_tlv_atexitEPFvPvES1_ + 0
6   status_nodejs_addon.node            0x000000016007853c runtime.syscall.abi0 + 44
7   status_nodejs_addon.node            0x0000000160076c7c runtime.asmcgocall.abi0 + 124
Caught signal: 6
Backtrace:
0   status_nodejs_addon.node            0x0000000160001dd0 _ZN6status13SignalHandlerEi + 48
1   libsystem_platform.dylib            0x00000001828a7584 _sigtramp + 56
2   libsystem_pthread.dylib             0x0000000182876c20 pthread_kill + 288
3   libsystem_c.dylib                   0x0000000182783a20 abort + 180
4   status_nodejs_addon.node            0x0000000160001e1c _ZN6status39_MultiAccountGenerateAndDeriveAddressesERKN2v820FunctionCallbackInfoINS0_5ValueEEE + 0
5   libsystem_platform.dylib            0x00000001828a7584 _sigtramp + 56
6   dyld                                0x0000000182560628 abort_with_payload_wrapper_internal + 104
7   dyld                                0x000000018256065c dyld + 493148
.
.
.
siddarthkay commented 5 months ago

these env vars help in adding more metadata to dyld calls.

git diff Makefile
diff --git a/Makefile b/Makefile
index b2ad755f6..0eee25cf2 100644
--- a/Makefile
+++ b/Makefile
@@ -345,7 +345,7 @@ else
        yarn node-pre-gyp rebuild && \
        yarn shadow-cljs compile mocks && \
        yarn shadow-cljs compile test && \
-       node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
+       DYLD_PRINT_APIS=1 DYLD_PRINT_BINDINGS=1 node --require ./test-resources/override.js "$$SHADOW_OUTPUT_TO"
 endif

 test: export SHADOW_OUTPUT_TO := target/test/test.js
siddarthkay commented 5 months ago

I guess one viable route here would be to build status-go-library with go 1.19 and compare the binary file with building status-go-library from 1.20 I wouldnt know what to compare between the two libraries but I can try comparing results of nm -u libstatus.a since that should give me an idea of the -u flag is supposed to display undefined symbols.

siddarthkay commented 5 months ago

I first found a tag in status-go where the go-waku upgrade had not happened and I picked https://github.com/status-im/status-go/releases/tag/v0.179.12 just to be safe.

I then built status-go library like this :

make status-go-library

I stored the output from reading symbols of the compiled libstatus.a like this :

nm -u result/libstatus.a > go-119-symbols.log

I then removed the go 1.19 hack fix which would ensure we now use go 1.20

diff --git a/nix/status-go/library/default.nix b/nix/status-go/library/default.nix
index 3a49a98a78d..831fbfd837a 100644
--- a/nix/status-go/library/default.nix
+++ b/nix/status-go/library/default.nix
@@ -1,9 +1,6 @@
-{ stdenv, meta, source, buildGo119Package, buildGo120Package }:
-let
-  # https://github.com/status-im/status-mobile/issues/19802
-  # only for Darwin to fix Integration Tests failing with missing symbols on go 1.20
-  buildGoPackageIntegrationTest = if stdenv.isDarwin then buildGo119Package else buildGo120Package;
-in buildGoPackageIntegrationTest {
+{ stdenv, meta, source, buildGoPackage }:
+
+buildGoPackage {
   pname = source.repo;
   version = "${source.cleanVersion}-${source.shortRev}";

and then I then cleaned up the status-go library with make clean and built status-go library again with make status-go-library. This would ensure that result folder would now contain binary built with go 1.20.

I then create a new log file to compare symbols like this :

nm -u result/libstatus.a > go-120-symbols.log
siddarthkay commented 5 months ago

And indeed there were differences which is what I would expect since the obscure error message is of a missing symbol.

Screenshot 2024-05-29 at 6 45 43 PM

However I found a familiar symbol which was missing.

Screenshot 2024-05-29 at 6 45 57 PM

And indeed this problem was introduced in go 1.20 and happens only on Darwin related issues in go repo

The suggestion was to add -lresolv flag along with netgo which indeed made the crash go away.

siddarthkay commented 5 months ago

Here is how I added those flags


diff --git a/nix/status-go/library/default.nix b/nix/status-go/library/default.nix
index 831fbfd837a..da3bec1e11f 100644
--- a/nix/status-go/library/default.nix
+++ b/nix/status-go/library/default.nix
@@ -22,11 +22,14 @@ buildGoPackage {
   '';

   # Build the Go library
+  # ld flags and netgo tag are necessary for integration tests to work on MacOS
+  # https://github.com/status-im/status-mobile/issues/20135
   buildPhase = ''
     runHook preBuild
     go build \
       -buildmode='c-archive' \
-      -tags='gowaku_skip_migrations gowaku_no_rln' \
+      -ldflags '-w -s -extldflags "-lresolv"' \
+      -tags='gowaku_skip_migrations gowaku_no_rln netgo' \
       -o "$out/libstatus.a" \
       $NIX_BUILD_TOP/main.go
     runHook postBuild