jeremy-rifkin / cpptrace

Simple, portable, and self-contained stacktrace library for C++11 and newer
MIT License
701 stars 75 forks source link

Disable stacktraces if ('full') symbols aren't available (or make this query-able) #84

Closed VelocityRa closed 8 months ago

VelocityRa commented 9 months ago

I have inherited cpptrace::lazy_exception to make my own type with some niceties, and I throw this exception type in my codebase.

cpptrace adds a stacktrace in the .what() string as expected, but if symbols (in my case a .pdb next to the .exe) are not available, it results in bogus stacktraces like so: image

These appear to be func names from libtpms, a library that I'm using that presumably embeds dwarf symbols. The exception does not originate there however, presumably it picks them up because there are no other symbols available (so it picks the closest before the exception instruction pointers).

Is there a solution? Perhaps simply a way to detect whether 'full' symbols are available, so I can make my class not fetch the stacktrace if so? Or perhaps simply whether the dbghelp backend is available / was used for a stacktrace?

Thanks for the awesome library!

jeremy-rifkin commented 9 months ago

Hi @VelocityRa, I see how that trace can be misleading and that's definitely something I'd like to address.

Defining what it means for a trace to be "full" or not is slightly unclear because it's not uncommon for some information to be missing (e.g. libraries might not be built with debug symbols) but the rest of the information should be accurate and hopefully useful.

Additionally, it's often not known whether there will be useful information until after attempting to resolve the trace.

I think ideally in this situation even when symbols aren't present for one object the trace should have blank or partially blank frames with as much information as was possible to extract, for example

Stack trace (most recent call first):
#0 0x00000001034fd788 at /Users/rifkin/foo/cpptrace/build/test
#1 0x00000001034fd758 at /Users/rifkin/foo/cpptrace/build/test
#2 0x00000001034fd728 at /Users/rifkin/foo/cpptrace/build/test
#3 0x00000001034fd908 in buzz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:4:2
#4 0x00000001034fd978 in biz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:8:2
#5 0x00000001034fd988 in bar() at /Users/rifkin/foo/cpptrace/build/bar.cpp:12:2
#6 0x00000001034fd8d8 at /Users/rifkin/foo/cpptrace/build/test
#7 0x00000001034fd8e8 at /Users/rifkin/foo/cpptrace/build/test
```or even
```c
Stack trace (most recent call first):
#0 0x00000001034fd788
#1 0x00000001034fd758
#2 0x00000001034fd728
#3 0x00000001034fd908 in buzz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:4:2
#4 0x00000001034fd978 in biz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:8:2
#5 0x00000001034fd988 in bar() at /Users/rifkin/foo/cpptrace/build/bar.cpp:12:2
#6 0x00000001034fd8d8
#7 0x00000001034fd8e8

Personally I'd lean more towards providing as much information as possible as opposed to conditionally disabling stacktrace generation, as long as what is there is accurate. Does that work for your use case?

I don't think frames should blindly resolve to the closest symbol if their actual entry isn't there but I can double check the logic. Ideally it is returning an error under the hood if a symbol isn't found.

It would be helpful to know a little more about how you are compiling and what's going on here. Are you using msvc? Are optimizations on?

I'm surprised to not see a main entry, usually the calls should look something like RtlUserThreadStart -> BaseThreadInitThunk -> mainCRTStartup -> __scrt_common_main -> __scrt_common_main_seh -> invoke_main -> main or RtlUserThreadStart -> BaseThreadInitThunk -> mainCRTStartup -> __tmainCRTStartup -> main.

I'm definitely surprised to see a SM2KeyExchange entry from BaseThreadInitThunk. At the moment either some bug with omitting frames when an error occurs or your theory about the resolver picking the closest symbol seems most plausible.

jeremy-rifkin commented 8 months ago

I'm going to close this for now as I think I need additional information to proceed. If there's any additional info or if you have any more library feedback please let me know!

VelocityRa commented 8 months ago

@jeremy-rifkin

Right, forgot about this sorry!

I'm compiling on MSVC on Release mode (so optimizations on). Using CMake with vcpkg, cpptrace version 0.3.1.

Specific flags I see in my build.ninja, if that helps:

  DEFINES = -DCPPTRACE_DEMANGLE_WITH_WINAPI -DCPPTRACE_GET_SYMBOLS_WITH_DBGHELP -DCPPTRACE_UNWIND_WITH_DBGHELP -D_UNICODE
  FLAGS = /DWIN32 /D_WINDOWS /EHsc /MP /O2 /Ob2 /DNDEBUG -std:c++20 -MD /WX /wd4251 /wd4275

That's the config generating traces like the one above, or this one I just got from another test:

Stack trace (most recent call first):
#0 0x00007ff696fecebf in SupportLibInit
#1 0x00007ff696f849a4 in TPMS_ENC_SCHEME_RSAES_Unmarshal
#2 0x00007ff6971326c6 in SupportLibInit
#3 0x00007ff697110787 in SupportLibInit
#4 0x00007ff69711050e in SupportLibInit
#5 0x00007ff697043b4e in SupportLibInit
#6 0x00007ffbf12a1bb1 in configthreadlocale
#7 0x00007ffbf1637343 in BaseThreadInitThunk
#8 0x00007ffbf34426b0 in RtlUserThreadStart

I'm surprised to not see a main entry [...]

I wasn't throwing from the main thread so that's to be expected afaik. After testing this on Debug I see RtlUserThreadStart -> BaseThreadInitThunk -> register_onexit_function -> std::thread::_Invoke<.... But yeah even if I do throw from the main thread on Release, I do get the issue.

I can only see main / __scrt_common_main_seh if throwing from main thread, and with symbols enabled (on Debug or RelWithDebInfo).

I get this when throwing from the main thread without symbols (on Release config):

Stack trace (most recent call first):
#0 0x00007ff6e4761aa2 in SupportLibInit
#1 0x00007ff6e488f0a3 in SM2KeyExchange
#2 0x00007ffbf1637343 in BaseThreadInitThunk
#3 0x00007ffbf34426b0 in RtlUserThreadStart

All examples have been from the cpptrace::lazy_exception. Here's cpptrace::generate_trace(1).print() (at the point of the catch) too:

Stack trace (most recent call first):
#0 0x00007ff6e47616ef in SupportLibInit
#1 0x00007ff6e48aa1e9 in SM2KeyExchange
#2 0x00007ffbdced107f in  ??
#3 0x00007ffbdced26a4 in _NLG_Return2
#4 0x00007ffbf3491715 in RtlCaptureContext2
#5 0x00007ff6e4761ab6 in SupportLibInit
#6 0x00007ff6e488f0a3 in SM2KeyExchange
#7 0x00007ffbf1637343 in BaseThreadInitThunk
#8 0x00007ffbf34426b0 in RtlUserThreadStart

I think ideally in this situation even when symbols aren't present for one object the trace should have blank or partially blank frames with as much information as was possible to extract Personally I'd lean more towards providing as much information as possible as opposed to conditionally disabling stacktrace generation, as long as what is there is accurate. Does that work for your use case?

I would be fine with that - assuming all printed symbol names are correct.

I would still want to be able query for whether full symbols are available (.pdb for the whole app) - I can elaborate on my use case if you want - but I understand if that's not feasible from your side. I could add that logic myself on my app at a higher level.

The above issue is still a library bug though, as far as I understand.

As said on my main post, my project is using libtpms (https://github.com/stefanberger/libtpms, via a simple custom vcpkg port). The printed symbols are from there. Maybe using this is required to repro the issue.

I'm willing to help with debugging this further. I can make changes to cpptrace itself if it'd help narrow down the issue.

jeremy-rifkin commented 8 months ago

Thank you, I’ll take a look tonight

jeremy-rifkin commented 8 months ago

Ok, I've tested locally trying to reproduce anything but I've been unsuccessful so far. I tested cpptrace::generate_trace().print(), throw cpptrace::runtime_error, and std::stacktrace for comparison.

Release: ``` Stack trace (most recent call first): #0 0x00007ff65c4b2022 in ?? #1 0x00007ff65c4b165a in ?? #2 0x00007ffe3c279362 in recalloc #3 0x00007ffe3e57257c in BaseThreadInitThunk #4 0x00007ffe3e94aa57 in RtlUserThreadStart 0> test+0x2057 1> test+0x165B 2> ucrtbase!recalloc+0xA3 3> KERNEL32!BaseThreadInitThunk+0x1D 4> ntdll!RtlUserThreadStart+0x28 cpptrace::runtime_error: Stack trace (most recent call first): #0 0x00007ff65c4b1e37 in ?? #1 0x00007ff65c4b208a in ?? #2 0x00007ff65c4b165a in ?? #3 0x00007ffe3c279362 in recalloc #4 0x00007ffe3e57257c in BaseThreadInitThunk #5 0x00007ffe3e94aa57 in RtlUserThreadStart -- main -- Stack trace (most recent call first): #0 0x00007ff65c4b218a in ?? #1 0x00007ff65c4b2d6b in ?? #2 0x00007ffe3e57257c in BaseThreadInitThunk #3 0x00007ffe3e94aa57 in RtlUserThreadStart ``` RelWithDebInfo: ``` Stack trace (most recent call first): #0 0x00007ff736f52e92 in foo() at C:\Users\rifkin\Projects\test-cpptrace\test.cpp:9 #1 0x00007ff736f5210a in std::thread::_Invoke, 0>() at C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\thread:55 #2 0x00007ffe3c279362 in recalloc #3 0x00007ffe3e57257c in BaseThreadInitThunk #4 0x00007ffe3e94aa57 in RtlUserThreadStart 0> C:\Users\rifkin\Projects\test-cpptrace\test.cpp(10): test!foo+0x47 1> C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\thread(56): test!std::thread::_Invoke,0>+0xB 2> ucrtbase!recalloc+0xA3 3> KERNEL32!BaseThreadInitThunk+0x1D 4> ntdll!RtlUserThreadStart+0x28 [foobar](cpptrace::runtime_error): Stack trace (most recent call first): #0 0x00007ff736f52c67 in bar() at C:\Users\rifkin\Projects\test-cpptrace\test.cpp:6 #1 0x00007ff736f52efa in foo() at C:\Users\rifkin\Projects\test-cpptrace\test.cpp:13 #2 0x00007ff736f5210a in std::thread::_Invoke, 0>() at C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\include\thread:55 #3 0x00007ffe3c279362 in recalloc #4 0x00007ffe3e57257c in BaseThreadInitThunk #5 0x00007ffe3e94aa57 in RtlUserThreadStart -- main -- Stack trace (most recent call first): #0 0x00007ff736f5301a in main() at C:\Users\rifkin\Projects\test-cpptrace\test.cpp:23 #1 0x00007ff736f54a0f in __scrt_common_main_seh() at D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 #2 0x00007ffe3e57257c in BaseThreadInitThunk #3 0x00007ffe3e94aa57 in RtlUserThreadStart ```

in ?? is consistent with what I'd expect when symbols aren't available for a function.

Do you think you could try doing a std::stacktrace at the same point where you're running into this with cpptrace just to see what it says?

And one more quick question, are you by chance doing this from within a signal handler?

jeremy-rifkin commented 8 months ago

Part of the difficulty on the cpptrace side of things is that for the most part dbghelp does all the heavy lifting. Cpptrace doesn't even check to see what module an instruction pointer come from much less look for or load a .pdb.

If you are interested, three things I can think of checking related to dbghelp handling

And if none of those are the issue... I'll be sad.

If that's the case the issue would be in resolve_frame, where there are basically four dbghelp calls: SymGetLineFromAddr64, SymFromAddr, SymSetContext, and SymEnumSymbols. The last of which has some parameters related to modules however I from what I can tell other code using SymEnumSymbols online is doing the same thing.

VelocityRa commented 8 months ago

std::stacktrace without a .pdb prints this: (on the main thread from a different point - don't remember where I tried it last time)

0> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0x44C9B
1> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0xD4A001
2> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0x1090F22
3> VCRUNTIME140_1+0x1080
4> VCRUNTIME140_1!_NLG_Return2+0x1555
5> ntdll!RtlCaptureContext2+0x4A6
6> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0xD4DC6D
7> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0xF65649
8> KERNEL32!BaseThreadInitThunk+0x14
9> ntdll!RtlUserThreadStart+0x21

So, wrong here too. With .pdb present:

0> C:\[redacted]src\[redacted]\error.cpp(42): [redacted]![redacted]::print_exception+0x96
1> C:\[redacted]tools\[redacted]\main.cpp(239): [redacted]!`main'::`2'::<lambda_1>::operator()+0x5C
2> C:\[redacted]tools\[redacted]\main.cpp(258): [redacted]!main$catch$70+0x2D
3> VCRUNTIME140_1+0x1080
4> VCRUNTIME140_1!_NLG_Return2+0x1555
5> ntdll!RtlCaptureContext2+0x4A6
6> C:\[redacted]tools\[redacted]\main.cpp(255): [redacted]!main+0xA60
7> D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(288): [redacted]!__scrt_common_main_seh+0x10C
8> KERNEL32!BaseThreadInitThunk+0x14
9> ntdll!RtlUserThreadStart+0x21

I also tried your three suggestions to no avail.

The first one results in <empty trace>. The second one results in the broken traces. The third one in <empty trace> for the exception type and

#0 0x00007ffb8e879e3f in  ??
#1 0x0000008fb4b2c92f in  ??

for generate_trace().

Guess I'll have to resort to simply checking if a .pdb exists next to my .exe at runtime for my Windows nightly builds (since I let users optionally download it).

Thank you for your time, I appreciate it :)

jeremy-rifkin commented 8 months ago

Thanks for the additional information. I’ll keep trying to reproduce and see what I can do.