llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.87k stars 11.93k forks source link

Stack overflow / segfault in `getASTRecordLayout` inspecting variables for rust enum values with matching field names #114068

Open LloydW93 opened 1 week ago

LloydW93 commented 1 week ago

Potentially similar issues: https://github.com/llvm/llvm-project/issues/64628 https://github.com/llvm/llvm-project/issues/43604 https://github.com/llvm/llvm-project/issues/63667 https://github.com/llvm/llvm-project/issues/66335 https://github.com/llvm/llvm-project/issues/53490

Probably related issue/fix: https://github.com/llvm/llvm-project/pull/68300 https://github.com/llvm/llvm-project/issues/68135

When debugging the following rust code using rust-lldb (a script that loads some extra formatters):

use std::collections::HashMap;

struct ValueType {}

enum OuterEnum {
    ValueType(ValueType),
}

fn main() {
    let mut map = HashMap::<String, OuterEnum>::new();
    map.insert("Host".into(), OuterEnum::ValueType(ValueType{}));
}

frame variable after map has been declared results in a stack overflow, followed by a segfault. The diagnostic.log is empty.

The top/tail of the stack:

* thread #1, name = 'lldb-20', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x7fffff7fee78)
  * frame #0: 0x00007ffff3d5c2f9 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 89
    frame #1: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #2: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #3: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #4: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #5: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #6: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #7: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #8: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #9: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
...
    frame #10568: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #10569: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #10570: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #10571: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #10572: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #10573: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261
    frame #10574: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050
    frame #10575: 0x00007ffff381b511 libclang-cpp.so.20.0`clang::ASTContext::getTypeInfoImpl(clang::Type const*) const + 2017
    frame #10576: 0x00007ffff381ce83 libclang-cpp.so.20.0`clang::ASTContext::getTypeInfo(clang::Type const*) const + 179
    frame #10577: 0x00007ffff794f2d7 liblldb-20.so.1`___lldb_unnamed_symbol28450 + 151
    frame #10578: 0x00007ffff73f69ea liblldb-20.so.1`___lldb_unnamed_symbol14339 + 410
    frame #10579: 0x00007ffff73f6a7d liblldb-20.so.1`___lldb_unnamed_symbol14340 + 13
    frame #10580: 0x00007ffff71785c8 liblldb-20.so.1`lldb::SBType::GetByteSize() + 232
    frame #10581: 0x00007ffff727caad liblldb-20.so.1`___lldb_unnamed_symbol9509 + 93
    frame #10582: 0x00007fffe98d8fc5 libpython3.12.so.1.0`cfunction_vectorcall_O(func=0x00007fffe5219ad0, args=0x00007fffe5995268, nargsf=<unavailable>, kwnames=0x0000000000000000) at methodobject.c:509:24
    frame #10583: 0x00007fffe987fafc libpython3.12.so.1.0`PyObject_Vectorcall [inlined] _PyObject_VectorcallTstate(kwnames=0x0000000000000000, nargsf=9223372036854775809, args=0x00007fffe5995268, callable=0x00007fffe5219ad0, tstate=0x00007fffe9f9cf68) at pycore_call.h:92:11

The loop starts when the synthetic lookup function calls GetByteSize() on map.base.table's type (<String, OuterEnum>). When inspecting the debug symbols, we can see there appears to be recursion:

(lldb) image lookup -A -t ValueType
1 match found in /path/to/test-c9fdb3ba00c12481:
id = {0x7fffff0000001912}, name = "ValueType", qualified = "test::OuterEnum::ValueType", byte-size = 0, compiler_type = "struct ValueType {
private:
    test::OuterEnum::ValueType __0;
}"

This is obviously circular and incorrect - and though I'm not sure if that's the symbol file or the interpretation, I'm hoping there's something that can be done to handle that better...

If we rename the Enum value's field to ValueType1, then everything works:

(lldb) image lookup -A -t ValueType
1 match found in /path/to/test-c9fdb3ba00c12481:
id = {0x7fffff0000001912}, name = "ValueType", qualified = "test::OuterEnum::ValueType", byte-size = 0, compiler_type = "struct ValueType {
private:
    test::ValueType1 __0;
}"

I've reproduced this with lldb-18, lldb-19.1, and nightly. Built using rust 1.82.

llvmbot commented 1 week ago

@llvm/issue-subscribers-clang-frontend

Author: Lloyd Wallis (LloydW93)

Potentially similar issues: https://github.com/llvm/llvm-project/issues/64628 https://github.com/llvm/llvm-project/issues/43604 https://github.com/llvm/llvm-project/issues/63667 https://github.com/llvm/llvm-project/issues/66335 https://github.com/llvm/llvm-project/issues/53490 Probably related issue/fix: https://github.com/llvm/llvm-project/pull/68300 https://github.com/llvm/llvm-project/issues/68135 When debugging the following rust code using rust-lldb (a script that loads some extra formatters): ``` use std::collections::HashMap; struct ValueType {} enum OuterEnum { ValueType(ValueType), } fn main() { let mut map = HashMap::<String, OuterEnum>::new(); map.insert("Host".into(), OuterEnum::ValueType(ValueType{})); } ``` `frame variable` after `map` has been declared results in a stack overflow, followed by a segfault. The diagnostic.log is empty. The top/tail of the stack: ``` * thread #1, name = 'lldb-20', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x7fffff7fee78) * frame #0: 0x00007ffff3d5c2f9 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 89 frame #1: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #2: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #3: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #4: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #5: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #6: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #7: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #8: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #9: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 ... frame #10568: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10569: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #10570: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10571: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #10572: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10573: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #10574: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10575: 0x00007ffff381b511 libclang-cpp.so.20.0`clang::ASTContext::getTypeInfoImpl(clang::Type const*) const + 2017 frame #10576: 0x00007ffff381ce83 libclang-cpp.so.20.0`clang::ASTContext::getTypeInfo(clang::Type const*) const + 179 frame #10577: 0x00007ffff794f2d7 liblldb-20.so.1`___lldb_unnamed_symbol28450 + 151 frame #10578: 0x00007ffff73f69ea liblldb-20.so.1`___lldb_unnamed_symbol14339 + 410 frame #10579: 0x00007ffff73f6a7d liblldb-20.so.1`___lldb_unnamed_symbol14340 + 13 frame #10580: 0x00007ffff71785c8 liblldb-20.so.1`lldb::SBType::GetByteSize() + 232 frame #10581: 0x00007ffff727caad liblldb-20.so.1`___lldb_unnamed_symbol9509 + 93 frame #10582: 0x00007fffe98d8fc5 libpython3.12.so.1.0`cfunction_vectorcall_O(func=0x00007fffe5219ad0, args=0x00007fffe5995268, nargsf=<unavailable>, kwnames=0x0000000000000000) at methodobject.c:509:24 frame #10583: 0x00007fffe987fafc libpython3.12.so.1.0`PyObject_Vectorcall [inlined] _PyObject_VectorcallTstate(kwnames=0x0000000000000000, nargsf=9223372036854775809, args=0x00007fffe5995268, callable=0x00007fffe5219ad0, tstate=0x00007fffe9f9cf68) at pycore_call.h:92:11 ``` The loop starts when the synthetic lookup function calls GetByteSize() on `map.base.table`'s type (`<String, OuterEnum>`). When inspecting the debug symbols, we can see there appears to be recursion: ``` (lldb) image lookup -A -t ValueType 1 match found in /path/to/test-c9fdb3ba00c12481: id = {0x7fffff0000001912}, name = "ValueType", qualified = "test::OuterEnum::ValueType", byte-size = 0, compiler_type = "struct ValueType { private: test::OuterEnum::ValueType __0; }" ``` This is obviously circular and incorrect - and though I'm not sure if that's the symbol file or the interpretation, I'm hoping there's something that can be done to handle that better... If we rename the Enum value's field to `ValueType1`, then everything works: ``` (lldb) image lookup -A -t ValueType 1 match found in /path/to/test-c9fdb3ba00c12481: id = {0x7fffff0000001912}, name = "ValueType", qualified = "test::OuterEnum::ValueType", byte-size = 0, compiler_type = "struct ValueType { private: test::ValueType1 __0; }" ```` I've reproduced this with lldb-18, lldb-19.1, and nightly. Built using rust 1.82.
Michael137 commented 1 week ago

Ah this reminds me of the issue I saw with static union members, where we thought the static data member was actually not a static, and went down the same recursion trying to lay out the member. (https://github.com/llvm/llvm-project/pull/68300)

Could you attach the DWARF that gets produced for this (e.g., using dwarfdump on the binary)?

llvmbot commented 1 week ago

@llvm/issue-subscribers-lldb

Author: Lloyd Wallis (LloydW93)

Potentially similar issues: https://github.com/llvm/llvm-project/issues/64628 https://github.com/llvm/llvm-project/issues/43604 https://github.com/llvm/llvm-project/issues/63667 https://github.com/llvm/llvm-project/issues/66335 https://github.com/llvm/llvm-project/issues/53490 Probably related issue/fix: https://github.com/llvm/llvm-project/pull/68300 https://github.com/llvm/llvm-project/issues/68135 When debugging the following rust code using rust-lldb (a script that loads some extra formatters): ``` use std::collections::HashMap; struct ValueType {} enum OuterEnum { ValueType(ValueType), } fn main() { let mut map = HashMap::<String, OuterEnum>::new(); map.insert("Host".into(), OuterEnum::ValueType(ValueType{})); } ``` `frame variable` after `map` has been declared results in a stack overflow, followed by a segfault. The diagnostic.log is empty. The top/tail of the stack: ``` * thread #1, name = 'lldb-20', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x7fffff7fee78) * frame #0: 0x00007ffff3d5c2f9 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 89 frame #1: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #2: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #3: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #4: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #5: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #6: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #7: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #8: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #9: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 ... frame #10568: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10569: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #10570: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10571: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #10572: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10573: 0x00007ffff3d61c75 libclang-cpp.so.20.0`___lldb_unnamed_symbol54200 + 261 frame #10574: 0x00007ffff3d5caa2 libclang-cpp.so.20.0`clang::ASTContext::getASTRecordLayout(clang::RecordDecl const*) const + 2050 frame #10575: 0x00007ffff381b511 libclang-cpp.so.20.0`clang::ASTContext::getTypeInfoImpl(clang::Type const*) const + 2017 frame #10576: 0x00007ffff381ce83 libclang-cpp.so.20.0`clang::ASTContext::getTypeInfo(clang::Type const*) const + 179 frame #10577: 0x00007ffff794f2d7 liblldb-20.so.1`___lldb_unnamed_symbol28450 + 151 frame #10578: 0x00007ffff73f69ea liblldb-20.so.1`___lldb_unnamed_symbol14339 + 410 frame #10579: 0x00007ffff73f6a7d liblldb-20.so.1`___lldb_unnamed_symbol14340 + 13 frame #10580: 0x00007ffff71785c8 liblldb-20.so.1`lldb::SBType::GetByteSize() + 232 frame #10581: 0x00007ffff727caad liblldb-20.so.1`___lldb_unnamed_symbol9509 + 93 frame #10582: 0x00007fffe98d8fc5 libpython3.12.so.1.0`cfunction_vectorcall_O(func=0x00007fffe5219ad0, args=0x00007fffe5995268, nargsf=<unavailable>, kwnames=0x0000000000000000) at methodobject.c:509:24 frame #10583: 0x00007fffe987fafc libpython3.12.so.1.0`PyObject_Vectorcall [inlined] _PyObject_VectorcallTstate(kwnames=0x0000000000000000, nargsf=9223372036854775809, args=0x00007fffe5995268, callable=0x00007fffe5219ad0, tstate=0x00007fffe9f9cf68) at pycore_call.h:92:11 ``` The loop starts when the synthetic lookup function calls GetByteSize() on `map.base.table`'s type (`<String, OuterEnum>`). When inspecting the debug symbols, we can see there appears to be recursion: ``` (lldb) image lookup -A -t ValueType 1 match found in /path/to/test-c9fdb3ba00c12481: id = {0x7fffff0000001912}, name = "ValueType", qualified = "test::OuterEnum::ValueType", byte-size = 0, compiler_type = "struct ValueType { private: test::OuterEnum::ValueType __0; }" ``` This is obviously circular and incorrect - and though I'm not sure if that's the symbol file or the interpretation, I'm hoping there's something that can be done to handle that better... If we rename the Enum value's field to `ValueType1`, then everything works: ``` (lldb) image lookup -A -t ValueType 1 match found in /path/to/test-c9fdb3ba00c12481: id = {0x7fffff0000001912}, name = "ValueType", qualified = "test::OuterEnum::ValueType", byte-size = 0, compiler_type = "struct ValueType { private: test::ValueType1 __0; }" ```` I've reproduced this with lldb-18, lldb-19.1, and nightly. Built using rust 1.82.
LloydW93 commented 1 week ago

dwardump.txt Sure - attaching it here (with a sed on the base dir, but same-length string replacement)