LLVM-produced debug info crashes LLDB

topolarity commented 2 years ago

; reduced.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @_start() {
Entry:
  unreachable
}

define internal fastcc void @printTargetArch(ptr %0) !dbg !2251 {
Entry:
  %1 = load i6, ptr null, align 1, !dbg !2258
  store i6 %1, ptr %0, align 1
  ret void
}

!llvm.module.flags = !{!0, !1}
!llvm.dbg.cu = !{!2}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 2, !"Dwarf Version", i32 4}
!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "zig 0.10.0", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !1306)
!3 = !DIFile(filename: "testme", directory: ".")
!4 = !{!96}
!21 = !DIBasicType(name: "u2", size: 2, encoding: DW_ATE_unsigned)
!27 = !DIFile(filename: "target.zig", directory: "/home/topolarity/repos/zig/build/stage3/lib/zig/std")

!96 = !DICompositeType(tag: DW_TAG_enumeration_type, name: "@typeInfo(std.target.Target.Os.VersionRange).Union.tag_type.?", scope: !27, file: !27, line: 225, baseType: !21, size: 8, align: 8, elements: !97)
!97 = !{!98, !99, !100, !101}
!98 = !DIEnumerator(name: "none", value: 0)
!99 = !DIEnumerator(name: "semver", value: 1)
!100 = !DIEnumerator(name: "linux", value: 2)
!101 = !DIEnumerator(name: "windows", value: 3)

!1306 = !{}

!1320 = !DICompositeType(tag: DW_TAG_structure_type, name: "std.target.Target.Os.VersionRange", scope: !27, file: !27, line: 225, size: 320, align: 32, elements: !1321)
!1321 = !{!1322}
!1322 = !DIDerivedType(tag: DW_TAG_member, name: "tag", scope: !1320, file: !27, line: 225, baseType: !96, size: 2, align: 8, offset: 288)

!2251 = distinct !DISubprogram(name: "printTargetArch", scope: !2252, file: !2252, line: 4, type: !2253, scopeLine: 4, flags: DIFlagStaticMember, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !2, retainedNodes: !2256)
!2252 = !DIFile(filename: "testme.zig", directory: "/home/topolarity/repos/zig")
!2253 = !DISubroutineType(types: !2254)
!2254 = !{null, !2255}
!2255 = !DIDerivedType(tag: DW_TAG_pointer_type, name: "*std.target.Target", baseType: !1320, size: 64, align: 64)
!2256 = !{!2257}
!2257 = !DILocalVariable(name: "t", arg: 1, scope: !2251, file: !2252, line: 4, type: !1320)
!2258 = !DILocation(line: 5, column: 39, scope: !2259)
!2259 = distinct !DILexicalBlock(scope: !2251, file: !2252, line: 4, column: 40)

To reproduce:

$ llc-15 ./reduced.ll -o ./test.o -filetype=obj
$ ld.lld-15 -o test ./test.o
$ lldb test -o "br set -f testme.zig -l 5" -o "r"

Result:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'lldb.embedded_interpreter'
(lldb) target create "testme"
Current executable set to '/home/topolarity/repos/zig/testme' (x86_64).
(lldb) br set -f testme.zig -l 5
Breakpoint 1: where = testme`printTargetArch at testme.zig:5:39, address = 0x0000000000201170
(lldb) r
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.  Program arguments: lldb testme -o "br set -f testme.zig -l 5" -o r -o exit
1.  HandleCommand(command = "r")
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamEi+0x31)[0x7fa28f0a5a71]
/lib/x86_64-linux-gnu/libLLVM-14.so.1(_ZN4llvm3sys17RunSignalHandlersEv+0xee)[0x7fa28f0a37be]
/lib/x86_64-linux-gnu/libLLVM-14.so.1(+0xea5fa6)[0x7fa28f0a5fa6]
/lib/x86_64-linux-gnu/libc.so.6(+0x3daf0)[0x7fa28da3daf0]
/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZNK5clang10ASTContext19getTypeAlignInCharsENS_8QualTypeE+0xb)[0x7fa29571755b]
/lib/x86_64-linux-gnu/libclang-cpp.so.14(+0xe62808)[0x7fa295a62808]
/lib/x86_64-linux-gnu/libclang-cpp.so.14(+0xe5eea6)[0x7fa295a5eea6]
/lib/x86_64-linux-gnu/libclang-cpp.so.14(_ZNK5clang10ASTContext18getASTRecordLayoutEPKNS_10RecordDeclE+0x346d)[0x7fa295a5a34d]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x959210)[0x7fa298f59210]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x51ba2b)[0x7fa298b1ba2b]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x4598d1)[0x7fa298a598d1]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x458e4b)[0x7fa298a58e4b]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x47a7cf)[0x7fa298a7a7cf]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x425567)[0x7fa298a25567]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x423fb3)[0x7fa298a23fb3]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x423fb3)[0x7fa298a23fb3]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x423d30)[0x7fa298a23d30]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x5a6e75)[0x7fa298ba6e75]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x5a7316)[0x7fa298ba7316]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x5ac72a)[0x7fa298bac72a]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x5ebea3)[0x7fa298bebea3]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x57d4f3)[0x7fa298b7d4f3]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x57c611)[0x7fa298b7c611]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x57b0c2)[0x7fa298b7b0c2]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x57f592)[0x7fa298b7f592]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x5d272d)[0x7fa298bd272d]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x9e5784)[0x7fa298fe5784]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x4edb67)[0x7fa298aedb67]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x4e2524)[0x7fa298ae2524]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x4e651a)[0x7fa298ae651a]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x42c4ef)[0x7fa298a2c4ef]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x40ccf4)[0x7fa298a0ccf4]
/lib/x86_64-linux-gnu/liblldb-14.so.1(+0x4e7fad)[0x7fa298ae7fad]
/lib/x86_64-linux-gnu/liblldb-14.so.1(_ZN4lldb10SBDebugger21RunCommandInterpreterERKNS_30SBCommandInterpreterRunOptionsE+0x85)[0x7fa298814165]
lldb[0x408107]
lldb[0x4094b5]
/lib/x86_64-linux-gnu/libc.so.6(+0x2920a)[0x7fa28da2920a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x7c)[0x7fa28da292bc]
lldb[0x4045ca]

Also reproduces with "llc-14 -opaque-pointers"

Downstream issue: https://github.com/ziglang/zig/issues/12929

llvmbot commented 2 years ago

@llvm/issue-subscribers-debuginfo

llvmbot commented 2 years ago

@llvm/issue-subscribers-lldb

DavidSpickett commented 2 years ago

I see on the linked zig issue something about it not reproducing with zig . Can you tell us what version of llvm that was based on?

What version of lldb are you using and did you keep the same version when testing the different versions of zig?

(I'm thinking about bisecting one or both hence the question)

topolarity commented 2 years ago

I'm afraid the comment on the Zig side is about reproduction on different platforms, not different LLVM versions. The issue reproduces on x86-64, but not AArch64.

I'm able to reproduce with both LLDB 14.0.6 and LLDB 15.0.1. A non-opaque-pointer version also reproduces with LLVM 14 for me:

define internal fastcc void @printTargetArch(i6 *%0) !dbg !2251 {
Entry:
  %1 = load i6, i6 *null, align 1, !dbg !2258
  store i6 %1, i6 *%0, align 1
  ret void
}

Unfortunately, I don't have LLVM <=13 on my machine to see if this ever worked

DavidSpickett commented 2 years ago

Ok well at least I can concentrate on llvm only, thanks for the isolated reproducer.

I'll see if I can reproduce then check if it worked prior.

DavidSpickett commented 2 years ago

This reproduces on AArch64 if I remove the data layout (not sure if that matters) and change the triple line. Happens all the way back to 12 so I think it's always been this way.

The assert is:

lldb: /home/david.spickett/llvm-project/clang/lib/AST/RecordLayoutBuilder.cpp:1494: void (anonymous namespace)::ItaniumRecordLayoutBuilder::LayoutWideBitField(uint64_t, uint64_t, bool, const clang::FieldDecl *): Assertion `!Type.isNull() && "Did not find a type!"' failed.

This hack gets us past the problem:

diff --git a/clang/lib/AST/RecordLayoutBuilder.cpp b/clang/lib/AST/RecordLayoutBuilder.cpp
index 6f3ede2ce42a..9a6aa43a56c9 100644
--- a/clang/lib/AST/RecordLayoutBuilder.cpp
+++ b/clang/lib/AST/RecordLayoutBuilder.cpp
@@ -1482,12 +1482,10 @@ void ItaniumRecordLayoutBuilder::LayoutWideBitField(uint64_t FieldSize,

   QualType Type;
   for (const QualType &QT : IntegralPODTypes) {
+    Type = QT;
     uint64_t Size = Context.getTypeSize(QT);
-
     if (Size > FieldSize)
       break;
-
-    Type = QT;
   }
   assert(!Type.isNull() && "Did not find a type!");

What is happening is we have a 2 bit wide field and we'd find char at 8 bits, break the for and never set the type. Whether this hack is correct I need to check the standards.

This is something to do with a "wide bitfield" which is one that is larger than the underlying type, just from skimming the comments. There is also this:

  assert(Context.getLangOpts().CPlusPlus &&
         "Can only have wide bit-fields in C++!");

Which makes me wonder if zig should have got this far but we are calling this from lldb so it's probably the case that only within clang that's what you'd expect. Might be irrelevant if the code in LayoutWideBitField turns out not to be correct anyway.

I will find the set of rules for these wide bitfields.

DavidSpickett commented 2 years ago

The rule clang is applying is from https://itanium-cxx-abi.github.io/cxx-abi/abi.html#layout "Allocation of Members Other Than Virtual Bases".

In particular "If sizeof(T)8 < n, let T' be the largest integral POD type with sizeof(T')8 <= n.".

ItaniumRecordLayoutBuilder::LayoutBitField gets StorageUnitSize from the type's TypeInfo. To get that we end up down in ASTContext::getTypeInfoImpl. This sees that the type of the field is Type::Enum. Then it gets the integer type of that enum which returns void (meaning the underlying enum type is void). Void has a storage size of 0 bytes. Presumably void is a fallback because we don't have any 2 bit type to use.

That means that sizeof(T) is 0, 0*8 is 0 and therefore 0 < 2 is true. We go to look for a different type to represent it as but since 2 is < the 8 bits of char, we break before setting any type.

I suspect we've never hit this before because this has only been used on C++ code, or on code generated with types that are >= 8 bits. For example this change to the IR makes it work:

- !21 = !DIBasicType(name: "u2", size: 2, encoding: DW_ATE_unsigned)
+ !21 = !DIBasicType(name: "u2", size: 8, encoding: DW_ATE_unsigned)

In that case the underlying type for the enum becomes char.

For the C++ case it's an error to assign a field bitsize greater than the underlying type's bitsize. So I'm not sure how one would hit this rule, but I didn't try particularly hard to find a way.

If you choose something like size 10 you also get char, which follows the rule but seems a bit odd. 16 is not <= 10 so it goes with the 8 bit type. But that's probably beside the point here.

So one might try to argue that zig shouldn't set a size < 8 but that means that any other arbitrarily sized type is suspect also, which doesn't seem right.

Seems like clang should handle the case where n < 8. Except that you can't make sizeof(T')*8 <= n when n < 8. So I don't see how we can stick to the rule there.

We could just have an escape hatch, if the underlying type is void assume char? Or maybe lldb could synthesize these types before we do all this work, if we know we're debugging zig.

Overall it's tricky because this path is intended for code coming from C++ where you can't trigger this afaict.

DavidSpickett commented 2 years ago

Going by https://ziglang.org/documentation/0.4.0/#Primitive-Types zig allows arbitrary sized integers:

In addition to the integer types above, arbitrary bit-width integers can be referenced by using an identifier of i or u followed by digits. For example, the identifier i7 refers to a signed 7-bit integer. The maximum allowed bit-width of an integer type is 65535.

Well, or it's packing a 4 choice enum into 2 bytes automatically. Either way they're a valid concept in the language unlike C++ that has multiples of byte sizes.

So telling zig not to emit some of those seems like a bad path.

dwblaikie commented 2 years ago

It may be a valid concept in zig, but why is lldb trying to parse that DWARF using clang at all, then? I'd have thought/my understanding was that lldb only uses the clang parsing for C/C++/ObjC/ObjC++, no? What DW_AT_language is zig using?

topolarity commented 2 years ago

Thanks for investigating @DavidSpickett ! Some thoughts:

It looks like T' is only used for alignment for wide bitfield types (which afaik are legal C++, e.g. uint8_t field: 256 has 8 bits for its value and 248 bits for padding). It might be coherent to declare a theoretical void field: 16 would have alignment 1 (effectively no alignment), no bits for a value and 16 bits for padding.

On the other hand, I don't think void field: 2 is the right way to lower this Zig type, since the 2 bits would end up padding bits. Instead, we need the equivalent of _BitInt(2) field: 2 (new in C23).

AFAIK the ABI for _BitInt is not totally determined yet so this may still be murky. Experimenting with this in clang shows that _BitInt(2) reports a size corresponding to its byte-aligned sizeof, not its bit-precise size. Maybe that's what Zig should do too?

It may be a valid concept in zig, but why is lldb trying to parse that DWARF using clang at all, then? I'd have thought/my understanding was that lldb only uses the clang parsing for C/C++/ObjC/ObjC++, no? What DW_AT_language is zig using?

Zig uses DW_LANG_C99. DW_LANG_ZIG is coming in a future DWARF revision iirc

jryans commented 2 years ago

Zig uses DW_LANG_C99. DW_LANG_ZIG is coming in a future DWARF revision iirc

It's true it isn't in a published version of the DWARF standard yet, but the language code has at least been allocated, so presumably it would be safe enough to start using.

dwblaikie commented 2 years ago

It may be a valid concept in zig, but why is lldb trying to parse that DWARF using clang at all, then? I'd have thought/my understanding was that lldb only uses the clang parsing for C/C++/ObjC/ObjC++, no? What DW_AT_language is zig using?

Zig uses DW_LANG_C99. DW_LANG_ZIG is coming in a future DWARF revision iirc

That seems like the bug to me. If that weren't being done, lldb wouldn't be trying to parse this using a C parser with those expectations (admittedly, lldb still shouldn't crash - but that's difficult to achieve while providing the high fidelity C++ parsing - the C++ parser is robust to C++ source code, but not robust to the source of ASTs generated by even valid C++ DWARF, let alone trying to impose those rules on other languages).

I'd suggest using an extension language code, or the allocated code despite it not being published yet (that's the reason for the allocation system - so you don't have to wait for a published standard to use the code).

topolarity commented 2 years ago

Unfortunately that allocation is for the new DWARFv6 DW_AT_language_name (https://dwarfstd.org/ShowIssue.php?issue=210419.1). If we use the allocated Zig code (0x1c) for DW_AT_language in DWARFv5, it ends up referring to Rust.

FWIW, the crash in lldb reproduces even when changing the DW_LANG to a non-C language, like Rust.

dwblaikie commented 2 years ago

FWIW, the crash in lldb reproduces even when changing the DW_LANG to a non-C language, like Rust.

That surprises me - my understanding was that lldb only tried to use the Clang AST builder if the language is C++ or similar. Might be worth looking into what's happening there/how that happens?

Ah, here: https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp#L107

But, yeah, I think all of these "Use Clang for XXX until there is a proper language plugin for it" are likely to produce a lot of problems because the Clang AST building code is unlikely to ever be robust enough to not crash, let alone to acceptably parse/understand these non-C-family languages.

DavidSpickett commented 2 years ago

which afaik are legal C++, e.g. uint8_t field: 256 has 8 bits for its value and 248 bits for padding

You're right, somehow I got an error when I tried this myself but checking again it is a warning.

Thanks all for explaining the DWARF side of things. I understand from that that Zig will never have a specific DW_AT_language language value. Which leaves us waiting for DWARFv6 for an official way to identify it.

We could make it manually selected, or find something else in the binary. How about the producer tag?

0x0000000b: DW_TAG_compile_unit
              DW_AT_producer    ("zig 0.10.0")
              DW_AT_language    (DW_LANG_C99)
              DW_AT_name        ("testme")

It's probably stretching the intent of the field but given there's only one zig compiler that limits what could go wrong.

Or LLDB/clang could include the change I showed, on the grounds that C++ code will never have an underlying type that is < 8 bits so the impact is small. On the promise that it gets removed once proper support is possible.

Either way we end up teaching lldb about zig sooner or later.

dwblaikie commented 2 years ago

Zig can have a DW_AT_language value - there's a user extension range in the DWARF spec so you can add things even if they aren't sanctioned by the DWARF Committee/published specs. (though you could also ask the committee to give you an old DW_AT_language value to use officially before/until the new language encoding attributes get standardized).

I'm not an lldb decider, but I think it's a lost cause/likely a source of continued bugs to try to parse non-C-family DWARF with Clang - it's already a very brittle system, and this would add a whole lot more instability.

topolarity commented 2 years ago

Thanks all for explaining the DWARF side of things. I understand from that that Zig will never have a specific DW_AT_language language value. Which leaves us waiting for DWARFv6 for an official way to identify it.

Did some digging, and actually Zig does have a provisional DW_AT_language allocation which was made before the accepted DW_AT_language_name: https://dwarfstd.org/ShowIssue.php?issue=190407.1 . Presumably we can built the ball rolling for language-specific support using 0x0026.

That said:

Or LLDB/clang could include the change I showed, on the grounds that C++ code will never have an underlying type that is < 8 bits so the impact is small. On the promise that it gets removed once proper support is possible.

I think that however LLDB/clang are planning to handle _BitInt is exactly how Zig will want to handle this. At the end of the day, Zig's arbitrary-width integers are LLVM's arbitrary-width integers and C23's _BitInt means that these will make it into C/C++ too (and are already usable as a language extension in clang).

The main thing to clarify IMO is whether it's correct to have LLDB handle bit-precise integer widths, or if in the debug data Zig should always report the padded sizeof the integer (in which case the bug is on the Zig side). Or should it be the bit-precise size rounded up to the nearest byte?

dwblaikie commented 2 years ago

Looks like Clang's had support for _BitInt for a while now ( https://reviews.llvm.org/D108643 ) - so probably reasonable to ask what DWARF and AST representation it uses for that, and emulate it.

(though, again, in general, I think it's ill advised to try to wedge other languages into lldb's already overburdened DWARF->Clang AST interface)

llvm / llvm-project

LLVM-produced debug info crashes LLDB #57910