llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.06k stars 11.59k forks source link

std::multimap::erase has strange/undefined behavior when passed an iterator from a different map #59525

Open DCtheTall opened 1 year ago

DCtheTall commented 1 year ago

Calling std::multimap::erase using an iterator from a different multimap, in some cases, does not cause a crash and can cause some strange/undefined behavior (e.g. it can cause the size method to return an incorrect value).

I have a toy Chromium CL reproducing the behavior.

DimitryAndric commented 1 year ago

Weird, indeed:

$ cat pr59525.cpp
#include <cassert>
#include <map>
#include <string>

int main()
{
  std::multimap<std::string, int> M1;
  std::multimap<std::string, int> M2;
  // For some reason it only doesn't crash if M1 has 2 elements,
  // if M1 has only one it does crash.
  M1.insert({"example.com", 1});
#if 0
  M1.insert({"abc.com", 0});
#endif
  M2.insert({"example.com", 0});
  auto iter = M1.find("example.com");
  auto end = M1.end();

  assert(iter != end);
  assert(iter->second == 1);
  M2.erase(iter);
  assert(M2.find("example.com") != M2.end());

  return 0;
}

$ clang++ -fsanitize=undefined pr59525.cpp -o pr59525

$ ./pr59525
/usr/include/c++/v1/__tree:87:24: runtime error: member access within misaligned address 0x000000000001 for type 'std::__tree_node_base<void *>', which requires 8 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/c++/v1/__tree:87:24 in
/usr/include/c++/v1/__tree:87:24: runtime error: load of misaligned address 0x000000000011 for type 'std::__tree_node_base<void *>::__parent_pointer' (aka 'std::__tree_end_node<std::__tree_node_base<void *> *> *'), which requires 8 byte alignment
0x000000000011: note: pointer points here
<memory cannot be printed>
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/c++/v1/__tree:87:24 in
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==31600==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000011 (pc 0x00000024e522 bp 0x000820811410 sp 0x0008208113e0 T100228)
==31600==The signal is caused by a READ memory access.
==31600==Hint: address points to the zero page.
    #0 0x24e522 in bool std::__1::__tree_is_left_child[abi:v15006]<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) pr59525.cpp
    #1 0x25239c in void std::__1::__tree_remove<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*, std::__1::__tree_node_base<void*>*) (/share/dim/src/llvm/bugs/pr59525/pr59525+0x25239c)
    #2 0x250e37 in std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>>>::__remove_node_pointer(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, void*>*) pr59525.cpp
    #3 0x250a20 in std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>>>::erase(std::__1::__tree_const_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, void*>*, long>) (/share/dim/src/llvm/bugs/pr59525/pr59525+0x250a20)
    #4 0x2483db in std::__1::multimap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>>, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const, int>>>::erase[abi:v15006](std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, int>, void*>*, long>>) pr59525.cpp
    #5 0x247bba in main (/share/dim/src/llvm/bugs/pr59525/pr59525+0x247bba)

UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV pr59525.cpp in bool std::__1::__tree_is_left_child[abi:v15006]<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*)
==31600==ABORTING
danakj commented 1 year ago

We (Chromium) are interested in a libc++ hardening mode check for this to avoid shipping bugs.

DimitryAndric commented 1 year ago

Yeah it is definitely undefined to use M1's iterators on M2, but there seems to be no way to catch that easily. Not with UndefinedBehaviorSanitizer or AddressSanitizer, at least. Valgrind also doesn't help much:

==31640== Memcheck, a memory error detector
==31640== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==31640== Using Valgrind-3.21.0.GIT and LibVEX; rerun with -h for copyright info
==31640== Command: ./pr59525
==31640==
==31640== Invalid read of size 8
==31640==    at 0x205560: bool std::__1::__tree_is_left_child[abi:v15006]<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:87)
==31640==    by 0x206038: void std::__1::__tree_remove<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*, std::__1::__tree_node_base<void*>*) (__tree:435)
==31640==    by 0x205D8C: std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int> > >::__remove_node_pointer(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*) (__tree:2250)
==31640==    by 0x205C61: std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int> > >::erase(std::__1::__tree_const_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*, long>) (__tree:2419)
==31640==    by 0x203DE9: std::__1::multimap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, int> > >::erase[abi:v15006](std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*, long> >) (map:2045)
==31640==    by 0x203A6A: main (pr59525.cpp:21)
==31640==  Address 0x11 is not stack'd, malloc'd or (recently) free'd
==31640==
==31640==
==31640== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==31640==  Access not within mapped region at address 0x11
==31640==    at 0x205560: bool std::__1::__tree_is_left_child[abi:v15006]<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*) (__tree:87)
==31640==    by 0x206038: void std::__1::__tree_remove<std::__1::__tree_node_base<void*>*>(std::__1::__tree_node_base<void*>*, std::__1::__tree_node_base<void*>*) (__tree:435)
==31640==    by 0x205D8C: std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int> > >::__remove_node_pointer(std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*) (__tree:2250)
==31640==    by 0x205C61: std::__1::__tree<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__map_value_compare<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int> > >::erase(std::__1::__tree_const_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*, long>) (__tree:2419)
==31640==    by 0x203DE9: std::__1::multimap<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int, std::__1::less<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, int> > >::erase[abi:v15006](std::__1::__map_iterator<std::__1::__tree_iterator<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, int>, void*>*, long> >) (map:2045)
==31640==    by 0x203A6A: main (pr59525.cpp:21)
==31640==  If you believe this happened as a result of a stack
==31640==  overflow in your program's main thread (unlikely but
==31640==  possible), you can try to increase the size of the
==31640==  main thread stack using the --main-stacksize= flag.
==31640==  The main thread stack size used in this run was 16777216.
==31640==
==31640== HEAP SUMMARY:
==31640==     in use at exit: 128 bytes in 2 blocks
==31640==   total heap usage: 2 allocs, 0 frees, 128 bytes allocated
==31640==
==31640== LEAK SUMMARY:
==31640==    definitely lost: 0 bytes in 0 blocks
==31640==    indirectly lost: 0 bytes in 0 blocks
==31640==      possibly lost: 0 bytes in 0 blocks
==31640==    still reachable: 128 bytes in 2 blocks
==31640==         suppressed: 0 bytes in 0 blocks
==31640== Rerun with --leak-check=full to see details of leaked memory
==31640==
==31640== For lists of detected and suppressed errors, rerun with: -s
==31640== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Segmentation fault
philnik777 commented 1 year ago

I'm not sure what this bug report is trying to report? I'm quite certain it's UB to pass an iterator from a different map. Re hardening, there is ongoing work to refactor the debug mode. @ldionne should have some more details, but that shouldn't be discussed in a bug report. I think part of that effort is to add more containers to the list of debug-enabled containers, but I'm not sure.

DCtheTall commented 1 year ago

Re hardening, there is ongoing work to refactor the debug mode.

Something that might be helpful is some kind of check enabled in debug mode where we add some sort of check that the iterator belongs to the multimap whose erase method is being called.

std::unordered_map (Chromium) has something like this.

I am not sure how we would replicate this check for std::multimap though. Another option for debug mode could be to include some uint32 field in the multimap/iterator implementation and check for equality when methods like erase are called.