google / sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
Other
11.34k stars 1.02k forks source link

ASAN seems to be running into non-deterministic SEGFAULTs #1780

Closed abhisen7 closed 2 weeks ago

abhisen7 commented 3 weeks ago

I am debugging build failures with llvm-tblgen (compiled with ASAN), and it is SEGFAULTING very randomly.

$ ./llvm-tblgen --help
Segmentation fault (core dumped)
abhishek@abhishek-ThinkCentre-M90t:~$ ./llvm-tblgen -d
llvm-tblgen: for the -d option: requires a value!
abhishek@abhishek-ThinkCentre-M90t:~$ ./llvm-tblgen -d
llvm-tblgen: for the -d option: requires a value!
abhishek@abhishek-ThinkCentre-M90t:~$ ./llvm-tblgen -d
llvm-tblgen: for the -d option: requires a value!
abhishek@abhishek-ThinkCentre-M90t:~$ ./llvm-tblgen -d
Segmentation fault (core dumped)

So I decided to look under gdb (run with address-randomization disabled), and the trace is pointing to an issue during the __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) call, but the worst part is that this is totally random and non-deterministic.

However, my project compiles just fine in CI. Would appreciate if you can help guide on what could possibly be going on here (besides a suspected RAM/CPU glitch)?

System specifics: Ubuntu 22.04.3, LLVM/Clang-15 (CI and local are both same)

When SEGFAULT is raised:

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x621d1ae4334b <__sanitizer::internal_mmap(void*,+0> add    BYTE PTR [rax], al
   0x621d1ae4334d <__sanitizer::internal_mmap(void*,+0> add    BYTE PTR [rax], al
   0x621d1ae4334f <__sanitizer::internal_mmap(void*,+0> add    BYTE PTR [rax], al
 → 0x621d1ae43351 <__sanitizer::internal_mmap(void*,+0> add    BYTE PTR [rax], al
   0x621d1ae43353 <__sanitizer::internal_mmap(void*,+0> add    BYTE PTR [rax], al
   0x621d1ae43355 <__sanitizer::internal_mmap(void*,+0> add    BYTE PTR [rax], al
   0x621d1ae43357                  add    BYTE PTR [rax], al
   0x621d1ae43359                  add    BYTE PTR [rax], al
   0x621d1ae4335b                  add    BYTE PTR [rax], al
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "llvm-tblgen", stopped 0x621d1ae43351 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) (), reason: SIGSEGV
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x621d1ae43351 → __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long)()
[#1] 0x621d1ae447fa → __sanitizer::MmapNamed(void*, unsigned long, int, int, char const*)()
[#2] 0x621d1ae4b7c8 → __sanitizer::ReservedAddressRange::Init(unsigned long, char const*, unsigned long)()
[#3] 0x621d1adadcba → __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::Init(int, unsigned long)()
[#4] 0x621d1adab2ad → __asan::Allocator::InitLinkerInitialized(__asan::AllocatorOptions const&)()
[#5] 0x621d1ae37caf → __asan::AsanInitInternal()()
[#6] 0x773a398225be → _dl_init(main_map=0x773a398572e0, argc=0x2, argv=0x7ffc6b4b6b98, env=0x7ffc6b4b6bb0)
[#7] 0x773a3983c2ca → _dl_start_user()
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  bt
#0  0x0000621d1ae43351 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) ()
#1  0x0000621d1ae447fa in __sanitizer::MmapNamed(void*, unsigned long, int, int, char const*) ()
#2  0x0000621d1ae4b7c8 in __sanitizer::ReservedAddressRange::Init(unsigned long, char const*, unsigned long) ()
#3  0x0000621d1adadcba in __sanitizer::SizeClassAllocator64<__asan::AP64<__sanitizer::LocalAddressSpaceView> >::Init(int, unsigned long) ()
#4  0x0000621d1adab2ad in __asan::Allocator::InitLinkerInitialized(__asan::AllocatorOptions const&) ()
#5  0x0000621d1ae37caf in __asan::AsanInitInternal() ()
#6  0x0000773a398225be in _dl_init (main_map=0x773a398572e0, argc=0x2, argv=0x7ffc6b4b6b98, env=0x7ffc6b4b6bb0) at ./elf/dl-init.c:102
#7  0x0000773a3983c2ca in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#8  0x0000000000000002 in ?? ()
#9  0x00007ffc6b4b70b2 in ?? ()
#10 0x00007ffc6b4b710a in ?? ()
#11 0x0000000000000000 in ?? ()
vitalybuka commented 3 weeks ago

Clang-15 is old Sanitizers constantly need updates for new OSes. Can you please try 18 or even better build from the HEAD?

abhisen7 commented 2 weeks ago

Hi @vitalybuka , thanks for the feedback and sorry for coming back a bit late. I tried with Clang-18, the segfaults seemed to go away but then I started running into compiler warnings/errors that are difficult to take care of atm, so I tried with Clang-15 inside a multipass env (running on the same machine), and fuzz target compiled just fine! The only diff is that my multipass VM runs Ubuntu 22.04.4 (same as CI), while host is running 22.04.3. Not really sure what the exact issue is, but I guess we can move on for now.