lmntal / slim

slim LMNtal implementation
Other
18 stars 5 forks source link

Bus Error on M1 mac #281

Open sano-jin opened 2 years ago

sano-jin commented 2 years ago

slim runs successfully but outputs

[1]    45373 bus error  /Users/sano/work/slim/bin/slim

in the last line on Sano's M1 mac.

need investigation on other M1 macs: tbd.

sano-jin commented 2 years ago

実行終了後にメモリを解放する段階で bus error が起きているようだ.

とりあえず,ここはうまくいっていない: https://github.com/lmntal/slim/blob/1052925f8d3f1e8eace14cd42eaf82fb9b469df1/src/vm/ccallback.cpp#L61

が,この(これとこの次の)2行をコメントアウトしても,

slim(28531,0x104700580) malloc: *** error for object 0x1045ec000: pointer being freed was not allocated
slim(28531,0x104700580) malloc: *** set a breakpoint in malloc_error_break to debug
>>>> lmn_profiler_finalize();>>>> destroy_rules();>>>> delete lmn_functor_table;[1]    28531 abort      /Users/sano/work/lmntal/slim/bin/slim test5.il

となったので,まだ調査が必要.

sano-jin commented 1 year ago

暫定的な解決法

main.cpp の slim_finalize(); をコメントアウトすれば(メモリリークを気にしなければ)問題なく動く. https://github.com/lmntal/slim/blob/c3c95f9ae2c0de5e9e9eb119caafc53ee559f716/src/main.cpp#L820

nikosai commented 1 year ago

これは関係するだろうか? https://github.com/ffi/ffi/issues/864

kyawaway commented 9 months ago

使用: Apple clang version 14.0.0 (clang-1400.0.29.202)

デバッガ(lldb)によると,

Process 77010 stopped                                                                                                  
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 3.1                                             
    frame #0: 0x00000001000d6498 slim`st_foreach(table=0x000000010100a960, func=(slim`CCallback::free_v(unsigned long, 
unsigned long, unsigned long) at ccallback.cpp:64:3), arg=0)(...), unsigned long) at st.cpp:517:26 [opt]               
   514    enum st_retval retval;                                                                                       
   515    int i;
   516 
-> 517    for (i = 0; i < table->num_bins; i++) {
   518      last = 0;
   519      for (ptr = table->bins[i];
   520           ptr != 0;) { /* エントリーの存在するチェインでのみfuncが呼ばれる */
Target 0: (slim) stopped.
(lldb) s
Process 77010 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000d64cc slim`st_foreach(table=0x000000010100a960, func=(slim`CCallback::free_v(unsigned long, 
unsigned long, unsigned long) at ccallback.cpp:64:3), arg=0)(...), unsigned long) at st.cpp:519:23 [opt]
   516 
   517    for (i = 0; i < table->num_bins; i++) {
   518      last = 0;
-> 519      for (ptr = table->bins[i];
   520           ptr != 0;) { /* エントリーの存在するチェインでのみfuncが呼ばれる */
   521        retval = (enum st_retval)(*func)(ptr->key, ptr->record, arg);
   522        switch (retval) {
Target 0: (slim) stopped.
(lldb) 
Process 77010 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000d64f8 slim`st_foreach(table=0x000000010100a960, func=(slim`CCallback::free_v(unsigned long, 
unsigned long, unsigned long) at ccallback.cpp:64:3), arg=0)(...), unsigned long) at st.cpp:521:45 [opt]
   518      last = 0;
   519      for (ptr = table->bins[i];
   520           ptr != 0;) { /* エントリーの存在するチェインでのみfuncが呼ばれる */
-> 521        retval = (enum st_retval)(*func)(ptr->key, ptr->record, arg);
   522        switch (retval) {
   523        case ST_CHECK: /* check if hash is modified during iteration */
   524          tmp = 0;
Target 0: (slim) stopped.
(lldb) 
Process 77010 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000771f4 slim`CCallback::free_v(key=<unavailable>, v=4295455220, _t=<unavailable>) at ccallback
.cpp:64:3 [opt]
   61   }
   62  
   63   int CCallback::free_v(st_data_t key, st_data_t v, st_data_t _t) {
-> 64     delete (CCallback *)v;
   65     return ST_CONTINUE;
   66   }
   67   /* コールバックを名前nameで登録する。arityはコールバックに引数として
Target 0: (slim) stopped.
(lldb) 
Process 77010 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in                                                    
    frame #0: 0x00000001000cb050 slim`operator delete(p=0x00000001000771f4) at alloc.cpp:169:3 [opt]                   
   166  }
   167 
   168  void operator delete(void* p) noexcept {
-> 169    lmn_free(p);
   170  }
   171 
   172  void operator delete[](void* p) noexcept {
Target 0: (slim) stopped.
(lldb)  
Process 77010 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000cb014 slim`lmn_free(p=0x00000001000771f4) at alloc.cpp:158:26 [opt]
   155    return result;
   156  }
   157 
-> 158  void lmn_free(void *p) { free(p); }
   159 
   160  void* operator new(std::size_t num) {
   161    return lmn_malloc(num);
Target 0: (slim) stopped.
(lldb) 
slim(77010,0x1f9833a80) malloc: *** error for object 0x1000771f4: pointer being freed was not allocated
slim(77010,0x1f9833a80) malloc: *** set a breakpoint in malloc_error_break to debug
Process 77010 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
    frame #0: 0x000000019e4961b0 libsystem_kernel.dylib`__pthread_kill + 8
libsystem_kernel.dylib`:
->  0x19e4961b0 <+8>:  b.lo   0x19e4961d0               ; <+40>
    0x19e4961b4 <+12>: pacibsp 
    0x19e4961b8 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x19e4961bc <+20>: mov    x29, sp
Target 0: (slim) stopped.
(lldb) 
Process 77010 exited with status = 6 (0x00000006) Terminated due to signal 6
(lldb) 

ccallback_tbl 中のなんらかの要素に対してfreeしようとして,これはallocしてないぞと怒られている.

kyawaway commented 9 months ago

使用: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 intelマシンでvalgrindを回してみる.

command: valgrind --tool=memcheck --leak-check=yes --show-reachable=yes result:

==954898== Memcheck, a memory error detector                                                                           
==954898== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.                                             
==954898== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info                                          
==954898== Command: ../lmntal/installed/bin/slim demo.il                                                               
==954898==                                                                                                             
a(b).                                                                                                                  
==954898==                                                                                                             
==954898== HEAP SUMMARY:                                                                                               
==954898==     in use at exit: 224 bytes in 2 blocks                                                                   
==954898==   total heap usage: 1,495 allocs, 1,493 frees, 252,594 bytes allocated                                      
==954898==                                                                                                             
==954898== 96 bytes in 1 blocks are still reachable in loss record 1 of 2                                              
==954898==    at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)                        
==954898==    by 0x4A0C990: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)                                        
==954898==    by 0x4A0B212: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)                                        
==954898==    by 0x4A0B2E4: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)                                        
==954898==    by 0x400647D: call_init.part.0 (dl-init.c:70) 
==954898==    by 0x4006567: call_init (dl-init.c:33)
==954898==    by 0x4006567: _dl_init (dl-init.c:117)
==954898==    by 0x40202C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==954898==    by 0x1: ???
==954898==    by 0x1FFF0004D6: ???
==954898==    by 0x1FFF0004F3: ???
==954898== 
==954898== 128 bytes in 1 blocks are still reachable in loss record 2 of 2
==954898==    at 0x484DCD3: realloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==954898==    by 0x4A0C9B8: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==954898==    by 0x4A21C54: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==954898==    by 0x4A0B2C8: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==954898==    by 0x400647D: call_init.part.0 (dl-init.c:70) 
==954898==    by 0x4006567: call_init (dl-init.c:33)
==954898==    by 0x4006567: _dl_init (dl-init.c:117)
==954898==    by 0x40202C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==954898==    by 0x1: ???
==954898==    by 0x1FFF0004D6: ???
==954898==    by 0x1FFF0004F3: ???
==954898== 
==954898== LEAK SUMMARY:
==954898==    definitely lost: 0 bytes in 0 blocks
==954898==    indirectly lost: 0 bytes in 0 blocks
==954898==      possibly lost: 0 bytes in 0 blocks
==954898==    still reachable: 224 bytes in 2 blocks
==954898==         suppressed: 0 bytes in 0 blocks
==954898== 
==954898== For lists of detected and suppressed errors, rerun with: -s
==954898== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

still reachableではあるが,2箇所でメモリリークが起きているのがわかる. (追記)これはそんなに関係ない気がしてきた.

kyawaway commented 9 months ago

alloc,freeしているアドレスを全部printしてみたところ,最後にfreeに失敗しているアドレスは,

kyawaway commented 9 months ago

使用: M1: Apple clang version 14.0.0 (clang-1400.0.29.202) intel: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

デバッガ M1:

Process 37362 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000d64d0 slim`st_foreach(table=0x0000000101206220, func=(slim`CCallback::free_v(unsigned long, unsigned long, unsigned long) at ccallback.cpp:64:3), arg=0)(...), unsigned long) at st.cpp:521:45 [opt]
   518      last = 0;
   519      for (ptr = table->bins[i];
   520           ptr != 0;) { /* エントリーの存在するチェインでのみfuncが呼ばれる */
-> 521        retval = (enum st_retval)(*func)(ptr->key, ptr->record, arg);
   522        switch (retval) {
   523        case ST_CHECK: /* check if hash is modified during iteration */
   524          tmp = 0;
Target 0: (slim) stopped.
(lldb) v
(st_table_t) table = 0x0000000101206220
(int (*)(...)) func = 0x0000000100077158 (slim`CCallback::free_v(unsigned long, unsigned long, unsigned long) at ccallback.cpp:64:3)
(st_data_t) arg = 0
(int) i = 0
(st_table_entry *) ptr = 0x0000000100707e20
(st_table_entry *) last = nullptr
(st_table_entry *) tmp = <variable not available>

(st_retval) retval = <variable not available>
.
.
.

* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x0000000100077158 slim`CCallback::free_v(key=<unavailable>, v=4295455064, _t=<unavailable>) at ccallback.cpp:64:3 [opt]
   61   }
   62  
   63   int CCallback::free_v(st_data_t key, st_data_t v, st_data_t _t) {
-> 64     delete (CCallback *)v;
   65     return ST_CONTINUE;
   66   }
   67   /* コールバックを名前nameで登録する。arityはコールバックに引数として
Target 0: (slim) stopped.
(lldb) v
(st_data_t) key = <no location, value may have been optimized out>

(st_data_t) v = 4295455064
(st_data_t) _t = <no location, value may have been optimized out>
Process 37340 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000cb028 slim`operator delete(p=0x0000000100077158) at alloc.cpp:176:3 [opt]
   173  }
   174 
   175  void operator delete(void* p) noexcept {
-> 176    lmn_free(p);
   177  }
   178 
   179  void operator delete[](void* p) noexcept {
Target 0: (slim) stopped.
(lldb) v
(void *) p = 0x0000000100077158

intel:

(gdb) s521           retval = (enum st_retval)(*func)(ptr->key, ptr->record, arg);
(gdb) p func
$6 = (int (*)(void)) 0x5555555afab8 <CCallback::free_v(unsigned long, unsigned long, unsigned long)>
(gdb) p ptr->key
$7 = 88
(gdb) p ptr->record
$8 = 93824993358816
(gdb) p arg
$9 = 0
(gdb) p ptr
$10 = (st_table_entry *) 0x5555556709c0
(gdb)
.
.
.
CCallback::free_v (key=88, v=93824993358816, _t=0) at ccallback.cpp:65
65      int CCallback::free_v(st_data_t key, st_data_t v, st_data_t _t) {
(gdb) p v
$4 = 93824993358816
(gdb) s
66        delete (CCallback *)v;
(gdb) p v
$5 = 93824993358816
operator delete (p=0x5555556673e0) at alloc.cpp:172
172       lmn_free(p);

のように,delete (CCallback *)v;の直前までは挙動が同じで,(CCallback *)vの結果で差が出ているように思われる.

deleteしたpに関して,M1では未allocのアドレスが出てきたが,intelではalloc済みのpが出てきた.

(CCallback *)vの値に関して,M1ではfuncのアドレスが出てきて,intelでは前にallocしたどこかのアドレスが出てきた.

vの値がおかしい?

kyawaway commented 9 months ago

vの値を追う.

st_foreach時,

intel:

st_foreach中:

Breakpoint 4, st_foreach(st_table*, int (*)(...), unsigned long) (table=0x55555566de50, func=func@entry=0x5555555afab8 <CCallback::free_v(unsigned long, unsigned long, unsigned long)>, arg=arg@entry=0) at st.cpp:512
512     int st_foreach(st_table_t table, int (*func)(ANYARGS), st_data_t arg) {
(gdb) s
517       for (i = 0; i < table->num_bins; i++) {
(gdb) s
519         for (ptr = table->bins[i];
(gdb) s
520              ptr != 0;) { /* エントリーの存在するチェインでのみfuncが呼ばれる */
(gdb) s
521           retval = (enum st_retval)(*func)(ptr->key, ptr->record, arg);
(gdb) p ptr->key
$26 = 88
(gdb) p ptr->record
$27 = 93824993358816
(gdb) p ptr
$28 = (st_table_entry *) 0x5555556709c0

(v=93824993358816) key=88の要素のinsert時:

Breakpoint 3, CCallback::lmn_register_c_fun (name=name@entry=0x55555560feba “cb_make_input_string”, f=f@entry=0x5555555f5cee <cb_make_input_string(LmnReactCxt*, LmnMembrane*, void*, unsigned char, void*, unsigned char)>, arity=arity@entry=2) at ccallback.cpp:71
71      void CCallback::lmn_register_c_fun(const char *name, void *f, int arity) {
(gdb) 
Continuing.
malloc: 0x5555556673e0
malloc: 0x555555670960
malloc: 0x555555670990

Breakpoint 1, st_insert (table=0x55555566de50, key=88, value=value@entry=93824993358816) at st.cpp:305
305     int st_insert(st_table_t table, st_data_t key, st_data_t value) {
(gdb) 
Continuing.

valueは間違いない.直後のfree_v呼び出しでも,

CCallback::free_v (key=88, v=93824993358816, _t=0) at ccallback.cpp:65
65      int CCallback::free_v(st_data_t key, st_data_t v, st_data_t _t) {
(gdb) s
66        delete (CCallback *)v;
(gdb) p v
$29 = 93824993358816

当然元のvalue=93824993358816が渡っている.

一方,M1では,

st_foreach中:

Process 10945 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000d64d8 slim`st_foreach(table=0x0000000101005140, func=(slim`CCallback::free_v(unsigned long, unsigned long, unsigned long) at ccallback.cpp:65), arg=0)(...), unsigned long) at st.cpp:522:45 [opt]
   519 
   520    for (i = 0; i < table->num_bins; i++) {
   521      last = 0;
-> 522      for (ptr = table->bins[i];
   523           ptr != 0;) { /* エントリーの存在するチェインでのみfuncが呼ばれる */
   524        retval = (enum st_retval)(*func)(ptr->key, ptr->record, arg);
   525        switch (retval) {
Target 0: (slim) stopped.
(lldb) v
(st_table_t) table = 0x0000000101005140
(int (*)(...)) func = 0x0000000100077188 (slim`CCallback::free_v(unsigned long, unsigned long, unsigned long) at ccallback.cpp:65)
(st_data_t) arg = 0
(int) i = 0
(st_table_entry *) ptr = 0x0000000101404660
(st_table_entry *) last = nullptr
(st_table_entry *) tmp = <variable not available>

(st_retval) retval = <variable not available>

(lldb) v ptr->record
(st_data_t) ptr->record = 4302336288
(lldb) v ptr->key
(st_data_t) ptr->key = 88

(v=4302336288) key=88要素のinsertは,

* thread #1, queue = ‘com.apple.main-thread’, stop reason = breakpoint 2.1
    frame #0: 0x00000001000771e8 slim`CCallback::lmn_register_c_fun(name=“cb_make_input_string”, f=0x00000001000d4820, arity=2) at ccallback.cpp:76:18 [opt]
   73   /* コールバックを名前nameで登録する。arityはコールバックに引数として
   74      渡されるアトムのリンク数 */
   75   void CCallback::lmn_register_c_fun(const char *name, void *f, int arity) {
-> 76     CCallback *c = new CCallback;
   77     c->f = f;
   78     c->arity = arity;
   79     st_insert(ccallback_tbl, (st_data_t)lmn_intern(name), (st_data_t)c);
Target 0: (slim) stopped.
(lldb) 
Process 10945 resuming
Process 10945 stopped
* thread #1, queue = ‘com.apple.main-thread’, stop reason = breakpoint 3.1
    frame #0: 0x00000001000d5d74 slim`st_insert(table=0x0000000101005140, key=88, value=4302336288) at st.cpp:306:13 [opt]
   303  /* ハッシュ表に新たなエントリーを追加する.
   304   * エントリが存在した場合は, エントリの値のみを更新し,
   305   * キーは元々のものを更新しない. エントリが存在しなかった場合に0,
-> 306   * エントリが存在した場合には1以上の整数を返す. */
   307  int st_insert(st_table_t table, st_data_t key, st_data_t value) {
   308    int ret = st_insert_inner(table, key, value);
   309    if (!ret && (table->num_entries / table->num_bins) > ST_DEFAULT_MAX_DENSITY) {
Target 0: (slim) stopped.

ここまでは問題ない.しかし,次のfree_v呼び出し時では,

Process 10945 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001000771ac slim`CCallback::free_v(key=<unavailable>, v=4295455112, _t=<unavailable>) at ccallback.cpp:70:3 [opt]
   67     //printf("v: %p\n", (CCallback *)v);
   68     //(CCallback *)vをprint
   69     printf("v(cast): %p\n", (CCallback *)v);
-> 70     delete (CCallback *)v;
   71     return ST_CONTINUE;
   72   }
   73   /* コールバックを名前nameで登録する。arityはコールバックに引数として
Target 0: (slim) stopped.
(lldb) v
(st_data_t) key = <no location, value may have been optimized out>

(st_data_t) v = 4295455112
(st_data_t) _t = <no location, value may have been optimized out>

vが違う値(v=4295455112)になっている.vに関しては見えなくなっている.

kyawaway commented 9 months ago

<no location, value may have been optimized out>: https://github.com/bytecodealliance/wasmtime/issues/3884

k-ueda commented 9 months ago

【clang】 MacBook Pro (M1, 2020, Sonoma 14.0, clang 15.0) でも bus error 現象を確認しました.(bison, ruby, autoconf@2.69 は brew で別途導入) また,test/system_check/testsuite/hyperlink の case5 は,bus error の前に segmentation fault します. (参考)Ubuntu22.04 PCで clang++-15 (+ libstdc++-12) でビルドしたら make check 全部通りました.

【gnu gcc】 M1 Mac でも,gcc-13 ならば make check 全部通りました.ただ,できた src/slim バイナリで .il ファイルは実行できるけど,.lmn を与えてコンパイル+実行しようとすると segmentation fault します.要調査. gcc-12 では,make check が,上記の case5 だけ FAIL して,調べてみるとこの例題だけ segmentation fault しています. bus error とは別に,case5.lmn の内部挙動を調査する必要があるかもしれません.

kyawaway commented 9 months ago

281 に関しては,(Ubuntu22.04, clang++-15)ではbus errorは出ないということならば,clang(llvm)は無関係で,M1固有の問題の可能性が高いということですね,ありがとうございます.

294 に関しては,#281 とは独立しているということで,別途調査します.