Closed galaxyskyknight closed 2 years ago
Also I see the following log:
Sun May 8 02:07:13 2022 daemon.info n2n[546]: WARNING: sendto(x.x.x.x:7681) failed (1) Operation not permitted
seems like the firewall try to deny the n2n accessing the peer. but anyway, the n2n code should not running into core dump. there must be some exception in socket or I/O handling I guess, in this special case.
there are 3 addtional core dump traps from kernel:
[ 1229.539617] traps: edge[327] trap stack segment ip:41a686 sp:7ffe09873030 error:0 in edge[403000+80000]
[ 1733.745667] traps: edge[546] trap stack segment ip:41a686 sp:7ffdffec47b0 error:0 in edge[403000+80000]
[ 1884.042965] traps: edge[13350] trap stack segment ip:41a686 sp:7ffc20df6d10 error:0 in edge[403000+80000]
You can see from the trap errors that the same IP is causing the trap each time. Without a symbol table from your compiled binary, we cannot tell where that IP is in the code. Are you able to run the edge with gdb and capture a backtrace? Is your build process automated and documented anywhere? Do you have the core dump file that was generated?
In case there was any doubt, we do not intend for the code to fail and trap and we want to fix that, but it is going to be practically impossible without some assistance from you.
You can see from the trap errors that the same IP is causing the trap each time. Without a symbol table from your compiled binary, we cannot tell where that IP is in the code. Are you able to run the edge with gdb and capture a backtrace? Is your build process automated and documented anywhere? Do you have the core dump file that was generated?
In case there was any doubt, we do not intend for the code to fail and trap and we want to fix that, but it is going to be practically impossible without some assistance from you.
please refer to issue #980 , reported 11 day ago,it is the same problem, As I told, this issue is obviously introduced by the code between https://github.com/ntop/n2n/commit/f3e305b254fc88ce829ddb4a63b11e083a65c3ab to the lastes one(my guess is that it is high probability releated to the change on https://github.com/ntop/n2n/commit/009311d016bf27f40259e6bb992ce4a78af24424). I have identified, if I use this commit:https://github.com/ntop/n2n/commit/f3e305b254fc88ce829ddb4a63b11e083a65c3ab, no this core dump issue at all, whatever I use make clean or not, but once I used the latest or latter one, the issue happend, I suggest that there acuturally is nothing can be done but you can help to inspect the code and if possible, try to do more exceptional test case like this way, say you can using this senario: setup a real edge enviroment and use the iptables restart/reload to simulate the block/unblock the edge access and see it could trigger the issue, I guess this is the tricky and you can use the gdb/coredump in your setup.
For you asked, I cannot help indeed, unless you can tell me how to setup gdb in a openwrt enviroment or how to collect core dump.
I hope this could be addressed and resolved by your guys, otherwise I will be blocked here and cannot upgrade to the latter feature upgrade regarding to the deploy stablization.
thanks for your great help.
I suggest that there acuturally is nothing can be done but you can help to inspect the code
I wonder if maybe a ./configure CFLAGS="-fsanitize=address -g"
, make clean
and make
would already generate some more helpful output.
Also, what are the details of your build process?
try to do more exceptional test case like this way, say you can using this senario
We definitely are not able to test all possible hardware and software scenarios. That's why we need help from you!
otherwise I will be blocked here and cannot upgrade to the latter feature upgrade regarding to the deploy stablization
And please keep in mind that "dev" is not "latest stable".
OK, I will try this ,what does this help?
It should add debug symbols and output some more meaningful output when crashing.
it is openwrt x86 build general build process, automatically, not any special. if you need Makefile, I can attach.
Yes, it would be interesting to see.
btw: I tried today again, once I back to previous commit, the coredump gone.
But I have to ask again, have you ever tried it with make clean
before running make
again? I have to repeat because I only see these strange and unexplainable crahses and seg faults and so on when I forgot to run make clean
before running make
again.
sorry, not intended to offend you guys' professional...
:orange_heart:
Oh, don't worry, we are way too professional to feel offended by anything :wink:
OK, I will try this ,what does this help?
It should add debug symbols and output some more meaningful output when crashing.
it is openwrt x86 build general build process, automatically, not any special. if you need Makefile, I can attach.
Yes, it would be interesting to see.
btw: I tried today again, once I back to previous commit, the coredump gone.
But I have to ask again, have you ever tried it with
make clean
before runningmake
again? I have to repeat because I only see these strange and unexplainable crahses and seg faults and so on when I forgot to runmake clean
before runningmake
again.
Sure,I wrote it to the Makefile.
sorry, not intended to offend you guys' professional...
🧡
Oh, don't worry, we are way too professional to feel offended by anything 😉
:P
there are a lot of link error when add the CFLAG, snap some latest lines, too many
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:68: undefined reference to `__asan_report_store4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:71: undefined reference to `__asan_report_load8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:71: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `mgmt_verbose':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:76: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:77: undefined reference to `__asan_report_load8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `mgmt_event_post2':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:90: undefined reference to `__asan_option_detect_stack_use_after_return'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:90: undefined reference to `__asan_stack_malloc_2'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:93: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:93: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:101: undefined reference to `__asan_report_store8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:104: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:112: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:115: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `mgmt_help_row':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:129: undefined reference to `__asan_report_load8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `mgmt_help_events_row':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:142: undefined reference to `__asan_option_detect_stack_use_after_return'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:142: undefined reference to `__asan_stack_malloc_2'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:147: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:152: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:153: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:154: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:155: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:160: undefined reference to `__asan_report_load8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `mgmt_auth':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:191: undefined reference to `__asan_report_load8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:197: undefined reference to `__asan_report_load4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `mgmt_req_init2':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:215: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:216: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:217: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:225: undefined reference to `__asan_report_load1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:226: undefined reference to `__asan_report_store4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:227: undefined reference to `__asan_report_load1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:228: undefined reference to `__asan_report_store4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:229: undefined reference to `__asan_report_load1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:230: undefined reference to `__asan_report_store4'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:243: undefined reference to `__asan_report_store8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:253: undefined reference to `__asan_report_store8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:260: undefined reference to `__asan_report_store1'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `_GLOBAL__sub_D_00099_0_send_reply':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:280: undefined reference to `__asan_unregister_globals'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: libn2n.a(management.o): in function `_GLOBAL__sub_I_00099_1_send_reply':
/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:280: undefined reference to `__asan_init'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:280: undefined reference to `__asan_version_mismatch_check_v8'
/home/builder/lede_x86/staging_dir/toolchain-x86_64_gcc-8.4.0_musl/lib/gcc/x86_64-openwrt-linux-musl/8.4.0/../../../../x86_64-openwrt-linux-musl/bin/ld: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/src/management.c:280: undefined reference to `__asan_register_globals'
collect2: error: ld returned 1 exit status
make[4]: *** [<builtin>: src/edge] Error 1
make[4]: Leaving directory '/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964'
make[3]: *** [Makefile:91: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/.built] Error 2
make[3]: Leaving directory '/home/builder/lede_x86/package/lean/n2n_v2'
time: package/lean/n2n_v2/compile#17.66#5.38#27.12
ERROR: package/lean/n2n_v2 failed to build.
make[2]: *** [package/Makefile:116: package/lean/n2n_v2/compile] Error 1
make[2]: Leaving directory '/home/builder/lede_x86'
make[1]: *** [package/Makefile:110: /home/builder/lede_x86/staging_dir/target-x86_64_musl/stamp/.package_compile] Error 2
make[1]: Leaving directory '/home/builder/lede_x86'
make: *** [/home/builder/lede_x86/include/toplevel.mk:230:world] Error 2
Sorry, we left off a config step - you also need to configure with LDFLAGS="-fsanitize=undefined -static-libubsan"
Sorry, we left off a config step - you also need to configure with
LDFLAGS="-fsanitize=undefined -static-libubsan"
transform_zstd.o src/transform_aes.o src/pearson.o src/supernode.o src/example_edge_embed_quick_edge_init.o src/cc20.o src/tuntap_netbsd.o src/edge_management.o src/edge_utils.o src/n2n.o src/tuntap_freebsd.o src/transform_lzo.o src/n2n_port_mapping.o src/random_numbers.o src/speck.o src/sn_management.o src/minilzo.o src/transform_tf.o src/example_sn_embed.o src/edge_utils_win32.o src/transform_cc20.o src/hexdump.o src/tuntap_linux.o src/n2n_regex.o src/transform_null.o src/curve25519.o src/aes.o src/sn_utils.o src/header_encryption.o src/transform_speck.o src/edge.o
x86_64-openwrt-linux-musl-gcc -fsanitize=undefined -static-libubsan -pthread -L. src/edge.o libn2n.a -ln2n -lpcap -lnatpmp -lminiupnpc -lcrypto -lcap -o src/edge
x86_64-openwrt-linux-musl-gcc -fsanitize=undefined -static-libubsan -pthread -L. src/supernode.o libn2n.a -ln2n -lpcap -lnatpmp -lminiupnpc -lcrypto -lcap -o src/supernode
x86_64-openwrt-linux-musl-gcc: error: libsanitizer.spec: No such file or directory
make[4]: *** [<builtin>: src/edge] Error 1
make[4]: Leaving directory '/home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964'
make[3]: *** [Makefile:91: /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-23e168b9551258983a4187357a4fcb57d060f964/.built] Error 2
make[3]: Leaving directory '/home/builder/lede_x86/package/lean/n2n_v2'
time: package/lean/n2n_v2/compile#22.03#7.93#23.43
ERROR: package/lean/n2n_v2 failed to build.
make[2]: *** [package/Makefile:116: package/lean/n2n_v2/compile] Error 1
Unfortunately, it is clear the the build environment provided by lede does not support the sanitizer options.
In order to narrow down anything about the error you are experiencing, there are a couple of options.
Since you have posted that you have seen the full error message from the sendto log output, it is unlikely that the core of patch you have pointed at is at fault as most of the new code is run before it outputs the log line.
still happened with your latest code https://github.com/ntop/n2n/commit/a274818854c01008a4106a44fd5ecd33d14091a4, I cannot do more for the debug but the following log listed a clear senario of its happening.... I guess maybe it is related to the n2n interface active/deactive flap from kernel which the program not well handle something so that probabaly cause the core dump based on the log.
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' is enabled
Wed May 18 10:50:30 2022 daemon.notice netifd: Network device 'n2n' link is up
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' has link connectivity
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' is setting up now
Wed May 18 10:50:30 2022 daemon.notice procd: /etc/rc.d/S99n2n_v2: 18/May/2022 10:50:30 [edge.c:1261] created local tap device IP: 10.0.0.2, Mask: 255.255.255.0, MAC: 00:90:10:00:00:02
Wed May 18 10:50:30 2022 daemon.info n2n[12803]: parent process is exiting (this is normal)
Wed May 18 10:50:30 2022 daemon.info n2n[12813]: WARNING: running as root is discouraged, check out the -u/-g options
Wed May 18 10:50:30 2022 daemon.info : 12[KNL] interface n2n activated
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' is now up
Wed May 18 10:50:30 2022 daemon.info n2n[12813]: edge started
Wed May 18 10:50:30 2022 daemon.info n2n[12813]: WARNING: failed to bind to local multicast group 224.0.0.68:1968 [errno 19]
Wed May 18 10:50:30 2022 daemon.info n2n[12813]: WARNING: sendto(*.*.*.*:10086) failed (101) Network unreachable
Wed May 18 10:50:30 2022 daemon.info : 09[KNL] 10.0.0.2 appeared on n2n
Wed May 18 10:50:30 2022 daemon.info : 12[KNL] 10.0.0.2 disappeared from n2n
Wed May 18 10:50:30 2022 daemon.info : 09[KNL] 10.0.0.2 appeared on n2n
Wed May 18 10:50:30 2022 kern.info kernel: [ 49.145834] traps: edge[12813] trap stack segment ip:466f14 sp:7ffe4a1af390 error:0 in edge[403000+80000]
Wed May 18 10:50:30 2022 daemon.info : 14[KNL] interface n2n deactivated
Wed May 18 10:50:30 2022 daemon.notice netifd: Network device 'n2n' link is down
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' has link connectivity loss
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' is now down
Wed May 18 10:50:30 2022 daemon.info : 10[KNL] 10.0.0.2 disappeared from n2n
Wed May 18 10:50:30 2022 daemon.info : 13[KNL] interface n2n deleted
Wed May 18 10:50:30 2022 daemon.notice netifd: Interface 'n2n' is disabled
Could you please run the edge with -vvvvvv
to provide more detailed log output, the DEBUG
class of messages, as well?
Also, the IP-address you x-ed out, is it a local internal address, or an external public one?
Could you please run the edge with
-vvvvvv
to provide more detailed log output, theDEBUG
class of messages, as well?Also, the IP-address you x-ed out, is it a local internal address, or an external public one?
external of coz, its my supernode IP and port is 10086..:) in that time, the pppoe session is not complete yet, so the supernode unreachable is reasonable.
ok, I will add -vvvvv in init scripts next time for debug. please be noticed: seeems the core only happened at the openwrt router reboot in system init stage, and after reboot, I manually pull up the edge and it is not seen core again for so long if I donot touch any network configuration and change anything.
I think you said that you compiled your binary with debugging (the -g
) option, can you try addr2line -e <your binary> 0x466f14
and send us the result?
https://github.com/ntop/n2n/blob/a274818854c01008a4106a44fd5ecd33d14091a4/src/edge_utils.c#L1040 However, when wan is closed or the network is abnormal(Network unreachable), goto err_out and return -1, coredump will occur.
I think you said that you compiled your binary with debugging (the
-g
) option, can you tryaddr2line -e <your binary> 0x466f14
and send us the result?
I hope this will do great help to you guys, do you have any assert for the eee pointer? I am just guessing the eee is a NULL pointer now, is there any multi-thread invoke?
On Wed, May 18, 2022 at 01:07:06AM -0700, Maha-5 wrote:
[1]https://github.com/ntop/n2n/blob/a274818854c01008a4106a44fd5ecd33d14 091a4/src/edge_utils.c#L1040 However, when wan is closed or the network is abnormal, goto err_out and return - 1, coredump will occur.
Can you isolate where the coredump happens? I have reviewed the code path that you are describing and can find no unchecked memory accesses that could cause a coredump in the err_out path. It must be happening somewhere further up the call chain, which encompases a quite large amount of code.
@hamishcoleman Sorry, I don't have the environment to analyze the core file.I can only be sure that return -1 on openwrt, it will coredump. Before 009311d016bf27f40259e6bb992ce4a78af24424 ,supernode_disconnect(eee) will be called only when rc <= 0. In this case, even if wan is down, will not coredump.
On Wed, May 18, 2022 at 02:06:29AM -0700, Maha-5 wrote:
@.*** Sorry, I don't have the environment to analyze the core file.I can only be sure that return -1 on openwrt, it will coredump. Before [2]009311d ,XXX will be called only when rc <= 0. In this case, even if wan is down, coredump will not be called. [3]1
The same code can actually be called in both paths, even in the earlier commit you refer to.
Can you compile your binary with debugging? (this is configure CFLAGS=-g
)
If you can get a coredump from a binary that has debugging enabled
then we may be able to take the kernel error message (which has an
"ip:" value) and determine where in the source the error is occuring.
If you post your debug enabled binary and the resulting coredump then we
might be able to extract that information without using your environment.
https://github.com/ntop/n2n/pull/999 I have tested 1047 lines, use if(sent != -1) or if(sent > 0), no coredump, but > 0 can be two less character. The main reason is that 1076 lines need to return sent
I think you said that you compiled your binary with debugging (the
-g
) option, can you tryaddr2line -e <your binary> 0x466f14
and send us the result?I hope this will do great help to you guys, do you have any assert for the eee pointer? I am just guessing the eee is a NULL pointer now, is there any multi-thread invoke?
@hamishcoleman Have you found any clue for this? thanks .
@galaxyskyknight no, I have no clues, I am searching in the dark, I really need help from you to track this down - since you can replicate the issue, I need you to help with some actual debugging. You should try and compile with debug symbols and get a coredump file or a backtrace or a addr2line result. No amount of looking at screenshots of sections of code will find the actual location that is causing the coredump.
I apologise, I see that one of the tiny screenshots you attached above was an addr2line output. Can you please paste the text so that I can cut and paste the details without any chance of errors. Screenshots are not useful debug tools
I apologise, I see that one of the tiny screenshots you attached above was an addr2line output. Can you please paste the text so that I can cut and paste the details without any chance of errors. Screenshots are not useful debug tools
So does that line of code give you any hint? Is it possible that it is like what I guess which theeee
pointer somehow run into NULL or invalid due to the interface link up/down status flap senario(probabaly this pointer is recycled somewhere in other routine due to interface down or sendto fail exception handler)?
https://github.com/ntop/n2n/pull/1001 This PR fixed.
This PR fixed.
It still is a diagnostic PR to help us understand the issues related to it.
Is it possible that it is like what I guess which the eee pointer somehow run into NULL or invalid due to the interface link up/down status flap senario(probabaly this pointer is recycled somewhere in other routine due to interface down or sendto fail exception handler)?
I have not seen any chance for this to happen so far but I like surprises. Hard to say. @galaxyskyknight could you provide the requested text information to help us fully understand the problem?
This PR fixed.
It still is a diagnostic PR to help us understand the issues related to it.
Is it possible that it is like what I guess which the eee pointer somehow run into NULL or invalid due to the interface link up/down status flap senario(probabaly this pointer is recycled somewhere in other routine due to interface down or sendto fail exception handler)?
I have not seen any chance for this to happen so far but I like surprises. Hard to say. @galaxyskyknight could you provide the requested text information to help us fully understand the problem?
I don't understand what the text information you are required? I have put the addr2line info there , it is clear illustrated that the coredump happened on edge_utils.c: line 2894, the line of code snapple is also attached there, what else are you expecting for?
what else are you expecting for?
Can you please paste the text so that I can cut and paste the details without any chance of errors. Screenshots are not useful debug tools
what else are you expecting for?
Can you please paste the text so that I can cut and paste the details without any chance of errors. Screenshots are not useful debug tools
what's different for you to understand the information that I pasted?
Not compiled with https://github.com/ntop/n2n/pull/1001 , compiled and tested with a274818854c01008a4106a44fd5ecd33d14091a4
root@Router:~# edge -f -d n2n -l www.test.com:10254 -c test -A4 -k asdfasdfasf -a 10.0.0.100/24 -r -H -vvvvvv 20/May/2022 23:58:05 [n2n.c:288] WARNING: supernode2sock fails to resolve supernode host www.test.com, -3: Try again 20/May/2022 23:58:05 [edge_utils.c:3590] adding supernode = www.test.com:10254 20/May/2022 23:58:05 [edge.c:1112] starting n2n edge 3.1.1 May 19 2022 17:33:00 20/May/2022 23:58:05 [edge.c:1118] using compression: none. 20/May/2022 23:58:05 [edge.c:1119] using ChaCha20 cipher. 20/May/2022 23:58:05 [edge_utils.c:402] number of supernodes in the list: 1 20/May/2022 23:58:05 [edge_utils.c:404] supernode 0 => www.test.com:10254 20/May/2022 23:58:05 [transform_cc20.c:134] setup_cc20_key completed 20/May/2022 23:58:05 [edge_utils.c:437] Header encryption is enabled. 20/May/2022 23:58:05 [edge.c:1143] use manually set IP address 20/May/2022 23:58:05 [edge.c:1161] skip PING to supernode 20/May/2022 23:58:05 [edge_utils.c:314] PMTU discovery disabled 20/May/2022 23:58:05 [edge.c:1225] skip auto IP address asignment 20/May/2022 23:58:05 [tuntap_linux.c:203] Waiting for TAP interface to be up and running... 20/May/2022 23:58:05 [tuntap_linux.c:224] Interface is up and running 20/May/2022 23:58:05 [edge.c:1258] created local tap device IP: 10.0.0.100, Mask: 255.255.255.0, MAC: EA:26:6D:40:E2:70 20/May/2022 23:58:05 [edge.c:1325] WARNING: n2n has not been compiled with libcap-dev; some commands may fail 20/May/2022 23:58:05 [edge.c:1330] dropping privileges to uid=65534, gid=65534 20/May/2022 23:58:05 [edge.c:1356] edge started 20/May/2022 23:58:05 [edge_utils.c:1564] update_supernode_reg: doing fast retry. 20/May/2022 23:58:05 [edge_utils.c:1167] WARNING: failed to bind to local multicast group 224.0.0.68:1968 [errno 19] 20/May/2022 23:58:10 [n2n.c:288] WARNING: supernode2sock fails to resolve supernode host www.test.com, -3: Try again 20/May/2022 23:58:10 [edge_utils.c:2137] Rx TAP packet ( 110) for 33:33:00:00:00:16 20/May/2022 23:58:10 [edge_utils.c:2143] dropping Tx multicast 20/May/2022 23:58:10 [edge_utils.c:1564] update_supernode_reg: doing fast retry. 20/May/2022 23:58:10 [edge_utils.c:1167] WARNING: failed to bind to local multicast group 224.0.0.68:1968 [errno 19] 20/May/2022 23:58:15 [n2n.c:288] WARNING: supernode2sock fails to resolve supernode host www.test.com, -3: Try again 20/May/2022 23:58:15 [n2n.c:604] Purging old registrations 20/May/2022 23:58:15 [n2n.c:609] Remove 0 registrations 20/May/2022 23:58:15 [edge_utils.c:2137] Rx TAP packet ( 110) for 33:33:00:00:00:16 20/May/2022 23:58:15 [edge_utils.c:2143] dropping Tx multicast 20/May/2022 23:58:15 [edge_utils.c:1564] update_supernode_reg: doing fast retry. 20/May/2022 23:58:15 [edge_utils.c:1167] WARNING: failed to bind to local multicast group 224.0.0.68:1968 [errno 19] 20/May/2022 23:58:20 [n2n.c:288] WARNING: supernode2sock fails to resolve supernode host www.test.com, -3: Try again 20/May/2022 23:58:20 [edge_utils.c:1226] send PING to supernodes 20/May/2022 23:58:20 [edge_utils.c:1069] sendto(0.0.0.0:10254) failed (97) Address family not supported by protocol 20/May/2022 23:58:20 [edge_utils.c:352] closed 20/May/2022 23:58:20 [edge_utils.c:1083] error in sendto_fd Bus error
what's different for you to understand the information that I pasted?
You know, we want to do the debugging for the bug you encountered. And maybe even try to reproduce and check ourselves because we assume some underlying issue here... it would be kind if you could support us with easier accessible information (not scrennshot but text) so we can debug more easily and perhaps even a bit faster without the hazzle of trying to type long strings from screenshots which would add another source for typos and slow things down because of it. Thank you!
The coredump will occur when failed to bind to local multicast group In addition, running the process directly does not generate a core file. If it is called through openwrt's netifd, a core file will be generated.
what's different for you to understand the information that I pasted?
You know, we want to do the debugging for the bug you encountered. And maybe even try to reproduce and check ourselves because we assume some underlying issue here... it would be kind if you could support us with easier accessible information (not scrennshot but text) so we can debug more easily and perhaps even a bit faster without the hazzle of trying to type long strings from screenshots which would add another source for typos and slow things down because of it. Thank you!
That's the next time things! we are not talking the same thing. even I type or copy it again, what do you think it will indeed help how much to dig issue futher more? you guys have already read and understand it even it is a screen snapshot format, right? You can ask me next time to do the proper way or the way you like and I will be glad to do that, but one time is good enough, I don't think the formalism could help you found the root cause much faster even if I do paste it here right now.
every time it occusrs in the same senario in my openwrt log:
the supernode (175.*.*.158
) change its punchhole port, howerver the local openwrt edge invoke sendto somehow handle the sendto() function in fatal error way,then the edge coredumped. it is not only happend on interface status flap.
FYI, I don't know what is the sendto failed (1) Operation not permitted means, I did google and it seems related to IPv6?
Can you initial the socket with IPv4 bind only? I have disabled ipv6/ip6tables on my openwrt. not sure if this will cause the failed(1) Operation not permitted problem.
@Logan007 @hamishcoleman
Mon May 23 11:08:01 2022 daemon.info n2n[17453]: peer 00:90:10:00:00:01 changed [175.*.*.158:36679] -> [175.*.*.158:59739]
Mon May 23 11:18:11 2022 daemon.info n2n[17453]: WARNING: sendto(175.*.*.158:36679) failed (1) Operation not permitted
Mon May 23 11:18:11 2022 kern.info kernel: [48818.807630] traps: edge[17453] trap stack segment ip:466f14 sp:7fff9bdb6070 error:0 in edge[403000+80000]
Mon May 23 11:18:11 2022 daemon.notice netifd: Network device 'n2n' link is down
Mon May 23 11:18:11 2022 daemon.notice netifd: Interface 'n2n' has link connectivity loss
Mon May 23 11:18:11 2022 daemon.notice netifd: Interface 'n2n' is now down
Mon May 23 11:18:11 2022 daemon.notice netifd: Interface 'n2n' is disabled
Mon May 23 11:18:12 2022 daemon.info vnstatd[11876]: Info: Interface "n2n" disabled.
The coredump will occur when failed to bind to local multicast group In addition, running the process directly does not generate a core file. If it is called through openwrt's netifd, a core file will be generated.
seems it is the same to me.
Hello, is there anyone still working on this issue? thanks.
I think the status is still "open".
I am pretty sure no core issue before f3e305b254fc88ce829ddb4a63b11e083a65c3ab commit (on April 11th )after so many days observing
It is still core on the latest code in the same place: Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' is enabled Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' is setting up now Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' is now up Mon Jun 27 18:34:25 2022 daemon.notice netifd: Network device 'n2n' link is up Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' has link connectivity Mon Jun 27 18:34:25 2022 daemon.info n2n[12531]: WARNING: running as root is discouraged, check out the -u/-g options Mon Jun 27 18:34:25 2022 daemon.info n2n[12531]: edge started Mon Jun 27 18:34:25 2022 daemon.info n2n[12531]: WARNING: failed to bind to local multicast group 224.0.0.68:1968 [errno 19] Mon Jun 27 18:34:25 2022 daemon.info n2n[12531]: WARNING: sendto(175...*:10086) failed (101) Network unreachable Mon Jun 27 18:34:25 2022 kern.info kernel: [ 47.222356] traps: edge[12531] trap stack segment ip:419dea sp:7ffefd75e860 error:0 in edge[403000+7e000] Mon Jun 27 18:34:25 2022 daemon.notice netifd: Network device 'n2n' link is down Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' has link connectivity loss Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' is now down Mon Jun 27 18:34:25 2022 daemon.notice netifd: Interface 'n2n' is disabled Mon Jun 27 18:34:27 2022 daemon.notice procd: /etc/rc.d/S99n2n_v2: SIOCADDRT: Network unreachable Mon Jun 27 18:34:27 2022 daemon.notice procd: /etc/rc.d/S99n2n_v2: SIOCADDRT: Network unreachable Mon Jun 27 18:34:27 2022 daemon.notice procd: /etc/rc.d/S99n2n_v2: SIOCADDRT: Network unreachable
builder@Build-Server:~/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-439dfc68865a286c48c79672a350cd467da38799$ addr2line -e edge 419dea /home/builder/lede_x86/build_dir/target-x86_64_musl/n2n-3.1.1_dev_git-439dfc68865a286c48c79672a350cd467da38799/src/edge_utils.c:2872
I am tired for this issue, is there any workaround fix? seems there is memory violation access for the 'eee' structure in some corner case within this code line.
Looks like this commit fix the coredump for now, I will keep an eye for a couple of days to see if it would happen again or gone, thanks for the efforts. @hamishcoleman
https://github.com/ntop/n2n/commit/06c489fd8ad42d6c025beaea0fd62d7d4d948c31
it happened after a mwan3 re-apply and firewall reload. somehow the edge exit and core dumped. should not be like this way.