Closed VelocityRa closed 8 months ago
Hi @VelocityRa, I see how that trace can be misleading and that's definitely something I'd like to address.
Defining what it means for a trace to be "full" or not is slightly unclear because it's not uncommon for some information to be missing (e.g. libraries might not be built with debug symbols) but the rest of the information should be accurate and hopefully useful.
Additionally, it's often not known whether there will be useful information until after attempting to resolve the trace.
I think ideally in this situation even when symbols aren't present for one object the trace should have blank or partially blank frames with as much information as was possible to extract, for example
Stack trace (most recent call first):
#0 0x00000001034fd788 at /Users/rifkin/foo/cpptrace/build/test
#1 0x00000001034fd758 at /Users/rifkin/foo/cpptrace/build/test
#2 0x00000001034fd728 at /Users/rifkin/foo/cpptrace/build/test
#3 0x00000001034fd908 in buzz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:4:2
#4 0x00000001034fd978 in biz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:8:2
#5 0x00000001034fd988 in bar() at /Users/rifkin/foo/cpptrace/build/bar.cpp:12:2
#6 0x00000001034fd8d8 at /Users/rifkin/foo/cpptrace/build/test
#7 0x00000001034fd8e8 at /Users/rifkin/foo/cpptrace/build/test
```or even
```c
Stack trace (most recent call first):
#0 0x00000001034fd788
#1 0x00000001034fd758
#2 0x00000001034fd728
#3 0x00000001034fd908 in buzz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:4:2
#4 0x00000001034fd978 in biz() at /Users/rifkin/foo/cpptrace/build/bar.cpp:8:2
#5 0x00000001034fd988 in bar() at /Users/rifkin/foo/cpptrace/build/bar.cpp:12:2
#6 0x00000001034fd8d8
#7 0x00000001034fd8e8
Personally I'd lean more towards providing as much information as possible as opposed to conditionally disabling stacktrace generation, as long as what is there is accurate. Does that work for your use case?
I don't think frames should blindly resolve to the closest symbol if their actual entry isn't there but I can double check the logic. Ideally it is returning an error under the hood if a symbol isn't found.
It would be helpful to know a little more about how you are compiling and what's going on here. Are you using msvc? Are optimizations on?
I'm surprised to not see a main
entry, usually the calls should look something like RtlUserThreadStart -> BaseThreadInitThunk -> mainCRTStartup -> __scrt_common_main -> __scrt_common_main_seh -> invoke_main -> main
or RtlUserThreadStart -> BaseThreadInitThunk -> mainCRTStartup -> __tmainCRTStartup -> main
.
I'm definitely surprised to see a SM2KeyExchange
entry from BaseThreadInitThunk
. At the moment either some bug with omitting frames when an error occurs or your theory about the resolver picking the closest symbol seems most plausible.
I'm going to close this for now as I think I need additional information to proceed. If there's any additional info or if you have any more library feedback please let me know!
@jeremy-rifkin
Right, forgot about this sorry!
I'm compiling on MSVC on Release mode (so optimizations on). Using CMake with vcpkg, cpptrace version 0.3.1.
Specific flags I see in my build.ninja
, if that helps:
DEFINES = -DCPPTRACE_DEMANGLE_WITH_WINAPI -DCPPTRACE_GET_SYMBOLS_WITH_DBGHELP -DCPPTRACE_UNWIND_WITH_DBGHELP -D_UNICODE
FLAGS = /DWIN32 /D_WINDOWS /EHsc /MP /O2 /Ob2 /DNDEBUG -std:c++20 -MD /WX /wd4251 /wd4275
That's the config generating traces like the one above, or this one I just got from another test:
Stack trace (most recent call first):
#0 0x00007ff696fecebf in SupportLibInit
#1 0x00007ff696f849a4 in TPMS_ENC_SCHEME_RSAES_Unmarshal
#2 0x00007ff6971326c6 in SupportLibInit
#3 0x00007ff697110787 in SupportLibInit
#4 0x00007ff69711050e in SupportLibInit
#5 0x00007ff697043b4e in SupportLibInit
#6 0x00007ffbf12a1bb1 in configthreadlocale
#7 0x00007ffbf1637343 in BaseThreadInitThunk
#8 0x00007ffbf34426b0 in RtlUserThreadStart
I'm surprised to not see a main entry [...]
I wasn't throwing from the main thread so that's to be expected afaik. After testing this on Debug I see RtlUserThreadStart -> BaseThreadInitThunk -> register_onexit_function -> std::thread::_Invoke<...
. But yeah even if I do throw from the main thread on Release, I do get the issue.
I can only see main
/ __scrt_common_main_seh
if throwing from main thread, and with symbols enabled (on Debug or RelWithDebInfo).
I get this when throwing from the main thread without symbols (on Release config):
Stack trace (most recent call first):
#0 0x00007ff6e4761aa2 in SupportLibInit
#1 0x00007ff6e488f0a3 in SM2KeyExchange
#2 0x00007ffbf1637343 in BaseThreadInitThunk
#3 0x00007ffbf34426b0 in RtlUserThreadStart
All examples have been from the cpptrace::lazy_exception
. Here's cpptrace::generate_trace(1).print()
(at the point of the catch
) too:
Stack trace (most recent call first):
#0 0x00007ff6e47616ef in SupportLibInit
#1 0x00007ff6e48aa1e9 in SM2KeyExchange
#2 0x00007ffbdced107f in ??
#3 0x00007ffbdced26a4 in _NLG_Return2
#4 0x00007ffbf3491715 in RtlCaptureContext2
#5 0x00007ff6e4761ab6 in SupportLibInit
#6 0x00007ff6e488f0a3 in SM2KeyExchange
#7 0x00007ffbf1637343 in BaseThreadInitThunk
#8 0x00007ffbf34426b0 in RtlUserThreadStart
I think ideally in this situation even when symbols aren't present for one object the trace should have blank or partially blank frames with as much information as was possible to extract Personally I'd lean more towards providing as much information as possible as opposed to conditionally disabling stacktrace generation, as long as what is there is accurate. Does that work for your use case?
I would be fine with that - assuming all printed symbol names are correct.
I would still want to be able query for whether full symbols are available (.pdb
for the whole app) - I can elaborate on my use case if you want - but I understand if that's not feasible from your side. I could add that logic myself on my app at a higher level.
The above issue is still a library bug though, as far as I understand.
As said on my main post, my project is using libtpms (https://github.com/stefanberger/libtpms, via a simple custom vcpkg port). The printed symbols are from there. Maybe using this is required to repro the issue.
I'm willing to help with debugging this further. I can make changes to cpptrace itself if it'd help narrow down the issue.
Thank you, I’ll take a look tonight
Ok, I've tested locally trying to reproduce anything but I've been unsuccessful so far. I tested cpptrace::generate_trace().print()
, throw cpptrace::runtime_error
, and std::stacktrace for comparison.
in ??
is consistent with what I'd expect when symbols aren't available for a function.
Do you think you could try doing a std::stacktrace
at the same point where you're running into this with cpptrace just to see what it says?
And one more quick question, are you by chance doing this from within a signal handler?
Part of the difficulty on the cpptrace side of things is that for the most part dbghelp does all the heavy lifting. Cpptrace doesn't even check to see what module an instruction pointer come from much less look for or load a .pdb.
If you are interested, three things I can think of checking related to dbghelp handling
cpptrace::experimental::set_cache_mode(cpptrace::cache_mode::prioritize_memory);
just to rule out any caching infoSymSetOptions(SYMOPT_ALLOW_ABSOLUTE_SYMBOLS);
in symbols_with_dbghelp might be wrong, it could be worth checking SymSetOptions(0);
SymInitialize(proc, NULL, TRUE)
might also be wrong, maybe the third parameter shouldn't be trueAnd if none of those are the issue... I'll be sad.
If that's the case the issue would be in resolve_frame
, where there are basically four dbghelp calls: SymGetLineFromAddr64, SymFromAddr, SymSetContext, and SymEnumSymbols. The last of which has some parameters related to modules however I from what I can tell other code using SymEnumSymbols
online is doing the same thing.
std::stacktrace
without a .pdb
prints this: (on the main thread from a different point - don't remember where I tried it last time)
0> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0x44C9B
1> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0xD4A001
2> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0x1090F22
3> VCRUNTIME140_1+0x1080
4> VCRUNTIME140_1!_NLG_Return2+0x1555
5> ntdll!RtlCaptureContext2+0x4A6
6> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0xD4DC6D
7> [redacted]!TPMI_ALG_SYM_OBJECT_Unmarshal+0xF65649
8> KERNEL32!BaseThreadInitThunk+0x14
9> ntdll!RtlUserThreadStart+0x21
So, wrong here too.
With .pdb
present:
0> C:\[redacted]src\[redacted]\error.cpp(42): [redacted]![redacted]::print_exception+0x96
1> C:\[redacted]tools\[redacted]\main.cpp(239): [redacted]!`main'::`2'::<lambda_1>::operator()+0x5C
2> C:\[redacted]tools\[redacted]\main.cpp(258): [redacted]!main$catch$70+0x2D
3> VCRUNTIME140_1+0x1080
4> VCRUNTIME140_1!_NLG_Return2+0x1555
5> ntdll!RtlCaptureContext2+0x4A6
6> C:\[redacted]tools\[redacted]\main.cpp(255): [redacted]!main+0xA60
7> D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl(288): [redacted]!__scrt_common_main_seh+0x10C
8> KERNEL32!BaseThreadInitThunk+0x14
9> ntdll!RtlUserThreadStart+0x21
I also tried your three suggestions to no avail.
The first one results in <empty trace>
.
The second one results in the broken traces.
The third one in <empty trace>
for the exception type and
#0 0x00007ffb8e879e3f in ??
#1 0x0000008fb4b2c92f in ??
for generate_trace()
.
Guess I'll have to resort to simply checking if a .pdb
exists next to my .exe
at runtime for my Windows nightly builds (since I let users optionally download it).
Thank you for your time, I appreciate it :)
Thanks for the additional information. I’ll keep trying to reproduce and see what I can do.
I have inherited
cpptrace::lazy_exception
to make my own type with some niceties, and I throw this exception type in my codebase.cpptrace adds a stacktrace in the
.what()
string as expected, but if symbols (in my case a.pdb
next to the.exe
) are not available, it results in bogus stacktraces like so:These appear to be func names from libtpms, a library that I'm using that presumably embeds dwarf symbols. The exception does not originate there however, presumably it picks them up because there are no other symbols available (so it picks the closest before the exception instruction pointers).
Is there a solution? Perhaps simply a way to detect whether 'full' symbols are available, so I can make my class not fetch the stacktrace if so? Or perhaps simply whether the
dbghelp
backend is available / was used for a stacktrace?Thanks for the awesome library!