rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.23k stars 468 forks source link

Linking with mold -run make resulted in illegal instructions in executable #154

Open ichn-hu opened 2 years ago

ichn-hu commented 2 years ago

I built the mold with version 1.0.0 with instructions on README

I was trying to replace ld with mold, and got very excited that the linking time reduced from minutes to seconds.

However, when execute the linked executable, it resulted in core dump, showing illegal instruction.

 1293123 illegal hardware instruction (core dumped)

I am running Arch Linux on AMD

lscpu                                                                                                                        132 ↵  
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         43 bits physical, 48 bits virtual
  Byte Order:            Little Endian                                                                                                                                                                                                                                          CPU(s):                  24                                                                                                                                                                                                                                                       On-line CPU(s) list:   0-23                                                                                                                                                                                                                                                   Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 3900X 12-Core Processor
    CPU family:          23
    Model:               113
    Thread(s) per core:  2
    Core(s) per socket:  12                                                                                                                                                                                                                                                         Socket(s):           1                                                                                                                                                                                                                                                          Stepping:            0                                                                                                                                                                                                                                                          Frequency boost:     enabled                                                                                                                                                                                                                                                    CPU max MHz:         4672.0698                                                                                                                                                                                                                                                  CPU min MHz:         2200.0000
    BogoMIPS:            7602.52
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall n 
                         x mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl p                                                                                                                                                                  ni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy sv                                                                                                                                                                  m extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_                                                                                                                                                                  nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bm 
                         i2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_m 
                         bm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb 
                         _clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip rdpid over 
                         flow_recov succor smca sme sev sev_es
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    6 MiB (12 instances)
  L3:                    64 MiB (4 instances)
NUMA:                    
  NUMA node(s):          1                                                                                                              
  NUMA node0 CPU(s):     0-23                                                                                                           
Vulnerabilities:         
  Itlb multihit:         Not affected                                                                                                   
  L1tf:                  Not affected                                                                                                   
  Mds:                   Not affected                                                                                                   
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp                                            
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization                                           
  Spectre v2:            Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling                               
  Srbds:                 Not affected                                                                                                   
  Tsx async abort:       Not affected                               

running gdb on dumped core

image

rui314 commented 2 years ago

Ah, interesting. Are you building an open-source program? I'd like to try that out myself.

ichn-hu commented 2 years ago

Ah, interesting. Are you building an open-source program? I'd like to try that out myself.

Sorry, the project I am working on is not yet open-sourced (although it would soon be open-sourced, perhaps in several months), but for now there are some cleaning-up work to be done and the current compliation steps are kind of complicated (to be optimized for open-source).

ichn-hu commented 2 years ago

The project is based on clickhouse (which is open sourced in https://github.com/ClickHouse/ClickHouse), I will try compile and linking clickhouse to see if the same problem exists, thanks!

rui314 commented 2 years ago

OK, thanks, can you show me stacktrace? Just typing bt in gdb should show you the stacktrace when the program crashes.

rui314 commented 2 years ago

So it is very likely that mold applies relocations in a wrong way and overwrites some data to a wrong place, corrupting machine instructions. If you know which object file contains __cxa_get_globals, you can display relocations of that object file by readelf -r path/to/that/object/file.o.

ichn-hu commented 2 years ago

OK, thanks, can you show me stacktrace? Just typing bt in gdb should show you the stacktrace when the program crashes.

sure

(gdb) bt
#0  0x00000000108bf5e7 in __cxa_get_globals ()
#1  0x00000000108bf44e in std::uncaught_exception() ()
#2  0x0000000010939189 in std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long) ()
#3  0x00000000107fd92a in Poco::ConsoleChannel::log (this=this@entry=0x7fe120ad24c0, msg=...)
    at /home/ichn/Projects/pingcap/tics/contrib/poco/Foundation/src/ConsoleChannel.cpp:47
#4  0x0000000010819fd0 in Poco::FormattingChannel::log (this=this@entry=0x7fe120a846c0, msg=...)
    at /home/ichn/Projects/pingcap/tics/contrib/poco/Foundation/src/FormattingChannel.cpp:91
#5  0x000000000b70bc22 in Poco::Logger::log (this=<optimized out>, 
    text="Init capacity [path=/home/ichn/Projects/pingcap/tics/build/tmp/] [capacity=0.00 B]", 
    prio=prio@entry=Poco::Message::PRIO_INFORMATION)
    at /home/ichn/Projects/pingcap/tics/contrib/poco/Foundation/include/Poco/Logger.h:619
#6  0x000000000b70bc76 in Poco::Logger::information (this=<optimized out>, msg=...)                                                     
    at /home/ichn/Projects/pingcap/tics/contrib/poco/Foundation/include/Poco/Logger.h:696                                               
#7  0x000000000ce8a181 in DB::PathCapacityMetrics::PathCapacityMetrics (this=0x7fe120fef650, capacity_quota_=capacity_quota_@entry=0,   
    main_paths_=std::vector of length 1, capacity 1 = {...}, main_capacity_quota_=std::vector of length 0, capacity 0,                  
    latest_paths_=std::vector of length 0, capacity 0, latest_capacity_quota_=std::vector of length 0, capacity 0)                      
    at /home/ichn/Projects/pingcap/tics/dbms/src/Storages/PathCapacityMetrics.cpp:70                                                    
#8  0x000000000cbf8487 in __gnu_cxx::new_allocator<DB::PathCapacityMetrics>::construct<DB::PathCapacityMetrics, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (                 
    this=this@entry=0x7fff8b8e991f, __p=__p@entry=0x7fe120fef650) at /usr/include/c++/11.1.0/ext/new_allocator.h:156                    
#9  0x000000000cbf8557 in std::allocator_traits<std::allocator<DB::PathCapacityMetrics> >::construct<DB::PathCapacityMetrics, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (__a=..., __p=0x7fe120fef650) at /usr/include/c++/11.1.0/bits/alloc_traits.h:512
#10 0x000000000cbf8603 in std::_Sp_counted_ptr_inplace<DB::PathCapacityMetrics, std::allocator<DB::PathCapacityMetrics>, (__gnu_cxx::_L 
--Type <RET> for more, q to quit, c to continue without paging--c                                                                       
ock_policy)2>::_Sp_counted_ptr_inplace<unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (this=0x7fe120fef640, __a=...) at /usr/include/c++/11.1.0/bits/shared_ptr_base.h:519
#11 0x000000000cbf86ed in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<DB::PathCapacityMetrics, std::allocator<DB::PathCapacityMetrics>, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (this=0x7fff8b8e9b38, __p=@0x7fff8b8e9b30: 0x0, __a=...) at /usr/include/c++/11.1.0/bits/shared_ptr_base.h:650
#12 0x000000000cbf87cf in std::__shared_ptr<DB::PathCapacityMetrics, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<DB::PathCapacityMetrics>, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (this=this@entry=0x7fff8b8e9b30, __tag=..., __tag@entry=...) at /usr/include/c++/11.1.0/bits/shared_ptr_base.h:1337
#13 0x000000000cbf885b in std::shared_ptr<DB::PathCapacityMetrics>::shared_ptr<std::allocator<DB::PathCapacityMetrics>, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (this=0x7fff8b8e9b30, __tag=...) at /usr/include/c++/11.1.0/bits/shared_ptr.h:409
#14 0x000000000cbf88e7 in std::allocate_shared<DB::PathCapacityMetrics, std::allocator<DB::PathCapacityMetrics>, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> (__a=...) at /usr/include/c++/11.1.0/bits/shared_ptr.h:861
#15 0x000000000cbf897c in std::make_shared<DB::PathCapacityMetrics, unsigned long&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&> () at /usr/include/c++/11.1.0/bits/shared_ptr.h:877
#16 0x000000000cbd99c0 in DB::Context::initializePathCapacityMetric (this=this@entry=0x7fe120b2e000, global_capacity_quota=<optimized out>, global_capacity_quota@entry=0, main_data_paths=std::vector of length 1, capacity 1 = {...}, main_capacity_quota=std::vector of length 0, capacity 0, latest_data_paths=std::vector of length 0, capacity 0, latest_capacity_quota=std::vector of length 0, capacity 0) at /home/ichn/Projects/pingcap/tics/dbms/src/Interpreters/Context.cpp:1494
#17 0x000000000c1784e0 in DB::tests::TiFlashTestEnv::initializeGlobalContext () at /home/ichn/Projects/pingcap/tics/dbms/src/TestUtils/TiFlashTestBasic.cpp:28
#18 0x000000000c15b31b in main (argc=<optimized out>, argv=0x7fff8b8ead48) at /home/ichn/Projects/pingcap/tics/dbms/src/TestUtils/gtests_dbms_main.cpp:13
ichn-hu commented 2 years ago

I am not quite familiar with the linking process, would you point out the object file that you want me to inspect based on the stack?

rui314 commented 2 years ago

That stack trace is interesting... It calls std::uncaught_exception, so it implies that your code has a throw statement which doesn't have a corresponding catch statement. Is this actually the case? The last source file in the stack trace is ConsoleChannel.cpp. Can you share ConsoleChannel.o with me?

ichn-hu commented 2 years ago

oh yeah, I am writing a test for that project which is not finished (that's why I need to compile and link many times and found linking takes 80% of the time).

Sure.

https://transfer.sh/5iIqV8/ConsoleChannel.cpp.o

rui314 commented 2 years ago

Are you linking a static executable?

__cxa_get_globals is usually defined in libstdc++.so, and its code is not writable in memory, so it is odd that the instruction is corrupted in memory.

ichn-hu commented 2 years ago

Yes, packages are statically linked.

rui314 commented 2 years ago

Can I see a disassembly of that particular function? Please share the result of the following command

objdump -d your-executable-file | grep -A10 '<__cxa_get_globals>:'
ichn-hu commented 2 years ago

@rui314 sorry for a late reply, see the result below

objdump -d dbms/gtests_dbms| grep -A10 '<__cxa_get_globals>:'                                                                                                                                                                                                       132 ↵
0000000010b39db0 <__cxa_get_globals>:
    10b39db0:   f3 0f 1e fa             endbr64 
    10b39db4:   48 83 ec 08             sub    $0x8,%rsp
    10b39db8:   66 66 66 64 48 8b 04    data16 data16 data16 mov %fs:0x0,%rax
    10b39dbf:   25 00 00 00 00 
    10b39dc4:   00 48 83                add    %cl,-0x7d(%rax)
    10b39dc7:   c4                      (bad)  
    10b39dc8:   08 48 05                or     %cl,0x5(%rax)
    10b39dcb:   d8 ff                   fdivr  %st(7),%st
    10b39dcd:   ff                      (bad)  
    10b39dce:   ff c3                   inc    %ebx
rui314 commented 2 years ago

It's usually something like this

00000000007c8eb0 <__cxa_get_globals>:
  7c8eb0:       f3 0f 1e fa             endbr64
  7c8eb4:       48 83 ec 08             sub    $0x8,%rsp
  7c8eb8:       66 66 66 64 48 8b 04    data16 data16 data16 mov %fs:0x0,%rax
  7c8ebf:       25 00 00 00 00
  7c8ec4:       48 83 c4 08             add    $0x8,%rsp
  7c8ec8:       48 05 b0 ff ff ff       add    $0xffffffffffffffb0,%rax
  7c8ece:       c3                      retq
  7c8ecf:       cc                      int3

, so mold writes an extra zero at the address 10b39dc4 in your case.

Does adding --no-relax (or -Wl,--no-relax) to the linker command line fix the issue for you?