Closed MichaelRawson closed 10 months ago
This fixes the issue for the static build -- a sample stack trace now looks like this on my machine:
----- stack dump -----
Version : Vampire 4.8 (commit 89dcf330a on 2023-11-24 14:58:53 +0100)
call stack: 0x592e25 0x2108b17 0x210727a 0x59559b 0x594671 0x59348e 0x5933ad 0xaa44c3 0x9642ea 0x96433f 0x59fd3b 0x5a2022
invoking addr2line(1) ...
_start
??:?
__libc_start_main_impl
??:?
__libc_start_call_main
libc-start.o:?
main
<vampire_dir>/vampire.cpp:615
vampireMode()
<vampire_dir>/vampire.cpp:373
doProving()
<vampire_dir>/vampire.cpp:150
getPreprocessedProblem()
<vampire_dir>/vampire.cpp:131
Shell::Preprocess::preprocess(Kernel::Problem&)
<vampire_dir>/Shell/Preprocess.cpp:219
Shell::AnswerLiteralManager::addAnswerLiterals(Kernel::Problem&)
<vampire_dir>/Shell/AnswerExtractor.cpp:400
Shell::AnswerLiteralManager::addAnswerLiterals(Lib::List<Kernel::Unit*>*&)
<vampire_dir>/Shell/AnswerExtractor.cpp:411
Debug::Assertion::violated(char const*, int, char const*)
<vampire_dir>/Debug/Assertion.cpp:53
Debug::Tracer::printStack(std::ostream&)
<vampire_dir>/Debug/Tracer.cpp:45
----- end of stack dump -----
However, for non-static build, this is what I get:
----- stack dump -----
Version : Vampire 4.8 (commit 89dcf330a on 2023-11-24 14:58:53 +0100)
call stack: 0x55e092848c25 0x7fa58ccd4e40 0x7fa58ccd4d90 0x55e09284b35f 0x55e09284a435 0x55e092849252 0x55e092849171 0x55e092d5a287 0x55e092c1a0ae 0x55e092c1a103 0x55e092855aff 0x55e092857de6
invoking addr2line(1) ...
??
??:0
??
??:0
??
... [and more of the same]
??:0
----- end of stack dump -----
Thanks for checking, Petra! Interesting with the non-static build. My hunch is that the addresses of functions are being scrambled by ASLR, based on the very high addresses.
I will try and chase this down over the weekend. Perhaps I can ask you to check again when I think I've fixed it.
My suspicion was correct: always nice when that happens! Interestingly my machine at work does not have ASLR enabled so I didn't catch it there, but at home it does and I could debug it. @hzzv - I fixed this problem at least, could you test for me?
The problem was that Vampire is built with "position-independent code" - we didn't choose this, it's just a default. This means that it can be loaded at any point in memory, if the user so decides. This is not necessarily a problem in itself: almost nobody does this themselves, and if they break backtraces as a result that's their problem. However, ASLR, a popular operating system security feature, uses this mechanism to load executables at a random address, which breaks addr2line
.
We can simply disable position-independent code for Vampire: we probably don't care about security mitigations like ASLR (?), and it breaks us here.
Some points:
-fno-pie
to the command-line @quickbeam123?Didn't have time to study this in detail yet, but on Mac, I am getting:
----- stack dump -----
Version : Vampire 4.8 (commit 1d226fad7 on 2023-11-27 10:12:40 +0000)
call stack: 0x1 0x7fff205dbf3d 0x10ad3b2a2 0x10ad39e56 0x10ad373df 0x10ad37320 0x10ab47b2d 0x10a586e88 0x10a58ab7b
invoking addr2line(1) ...
sh: addr2line: command not found
----- end of stack dump -----
also, I think I liked the increasing indents of the trace which I am not seeing in the examples above. Is their discontinuation part of the current PR?
on Mac, I am getting:
I think my Internet searches told me that this is something included in newer releases of OSX and installable on the older ones. There's also something called atos
but I'm not sure what this does. Thanks for testing: I'm open to suggestions about getting this to work better for Mac.
Is their discontinuation part of the current PR?
Not explicitly, but we don't control the output of addr2line
. :-( If this is a deal-breaker for you we can look into it more.
I already noticed some macro magic. Would it be easy to degrade to what we currently have in master, if addr2line won't get detected?
Which reminds me that my test was using Makefile. If that could be part of the explanation, I guess I should start reading this more carefully. E.g. what is the business about -fno-pie
?
For me it now works (both for static & non-static build).
There are a couple of lines in the stack trace which look redundant though - not sure if it's worth investigating. non-static:
----- stack dump -----
Version : Vampire 4.8 (commit 1d226fad7 on 2023-11-27 10:12:40 +0000)
call stack: 0x5a7c15 0x7f3d2a99ce40 0x7f3d2a99cd90 0x5aa34c 0x5a9422 0x5a823f 0x5a815e 0xab9273 0x97909a 0x9790ef 0x5b4aeb 0x5b6dd2
invoking addr2line(1) ...
_start
??:?
??
??:0
??
??:0
main
<vampire_dir>/vampire.cpp:615
...
static:
----- stack dump -----
Version : Vampire 4.8 (commit 1d226fad7 on 2023-11-27 10:12:40 +0000)
call stack: 0x592e25 0x2108b17 0x210727a 0x59559b 0x594671 0x59348e 0x5933ad 0xaa44c3 0x9642ea 0x96433f 0x59fd3b 0x5a2022
invoking addr2line(1) ...
_start
??:?
__libc_start_main_impl
??:?
__libc_start_call_main
libc-start.o:?
main
<vampire_dir>/vampire.cpp:615
...
There are a couple of lines in the stack trace which look redundant
Yeah, I think you mean the implementation-defined stuff "before" main, right? I'm not sure how best to deal with that, so I'm including it for now.
I already noticed some macro magic.
Which macro magic?
Would it be easy to degrade to what we currently have in master, if addr2line won't get detected?
Easy, yes. But I'm not sure it makes sense.
We currently link to a library (a small one shipped with the compiler, but a library), libdl
, which allows us to map symbols to names - but apparently a little unreliably, and we must add some data to the binary (-rdynamic
) to help it along. This PR involves shelling out to a system binary that may or may not exist - as you found out - but if it exists, it should work more reliably, include line numbers, and requires subtracting stuff from the binary (-fno-pie
). If we included both, we'd link to the library, have double the compiler magic, and then...run both approaches?
If there's a more Mac-flavour command (atos
?), we could invoke that, either with conditional compilation or just trying one command after the other.
Which reminds me that my test was using Makefile. If that could be part of the explanation, I guess I should start reading this more carefully.
No, this is not the issue you had. If you get "addr2line
not found", you're missing the addr2line
binary on your system for whatever reason. If you get a not-so-useful backtrace, probably you need to add -fno-pie
because of ASLR, although I'm not sure if OSX even has that.
PS sorry for the complication, anyway. In the previous PR I was under the impression I found something that works reliably. And it did for a while! This approach seems like a more reliable solution to the "it crashed on StarExec" scenario, if not to the "make it work on Martin's machine" scenario.
@quickbeam123 - according to the Apple atos
manpage, the correct invocation is atos -o <path to vampire> <addresses>
- you could try this instead of installing addr2line
if you prefer.
You will however need to configure -fno-pie
in your CXXFLAGS
, otherwise you'll fail to lookup any symbols because of ASLR, which macOS does apparently have.
Sorry for asking stupid questions before. Now I read the whole story more carefully and can appreciate the pros and cons (I hope). At the moment, the problem is that on mac, even after some medium effort of googling for answers, I am only getting
Martins-MacBook-Pro-2:vampire mbassms6$ ./vampire_dbg_michael-addr2line_6967
Condition in file vampire.cpp, line 373 violated:
true
----- stack dump -----
Version : Vampire 4.8 (commit 1d226fad7 on 2023-11-27 10:12:40 +0000)
call stack: 0x1 0x7fff205dbf3d 0x1007d7022 0x1007d5c99 0x100021d78 0x100025a6b
invoking addr2line(1) ...
sh: addr2line: command not found
----- end of stack dump -----
Martins-MacBook-Pro-2:vampire mbassms6$ atos -o ./vampire_dbg_michael-addr2line_6967 -arch x86_64 -l 0x1 0x7fff205dbf3d 0x1007d7022 0x1007d5c99 0x100021d78 0x100025a6b
0x7fff205dbf3d
0x1007d7022
0x1007d5c99
0x100021d78
0x100025a6b
Even for a -fno-pie
compile.
Also, read somewhere I should rather talk to the linker (via -Wl,-no_pie
). Now I got a warning
ld: warning: -no_pie is deprecated when targeting new OS versions
And still no reasonable answers from atos
.
Let's see if we can get it working as intended on your machine. Then we can debate whether the effort is worthwhile!
atos -o ./vampire_dbg_michael-addr2line_6967 -arch x86_64 -l 0x1 0x7fff205dbf3d 0x1007d7022 0x1007d5c99 0x100021d78 0x100025a6b
Did you intend to give -l 0x1
? Apparently that gives the load address, but that should be the default load address unless you loaded Vampire at 0x1 for some reason.
Well, I was supplying all I got from vampire on its call stack line. Now this 0x1 no longer appears:
./vampire_dbg_michael-addr2line_6967 Problems/PUZ/PUZ001+1.p
% Running in auto input_syntax mode. Trying TPTP
Condition in file vampire.cpp, line 157 violated:
true
----- stack dump -----
Version : Vampire 4.8 (commit 1d226fad7 on 2023-11-27 10:12:40 +0000)
call stack: 0x7fff203c2f3d 0x10f5e2002 0x10f5e0bb6 0x10f5de1ad 0x10ee2ccb8 0x10ee309ab
invoking addr2line(1) ...
sh: addr2line: command not found
----- end of stack dump -----
Martins-MacBook-Pro-2:vampire mbassms6$ atos -o ./vampire_dbg_michael-addr2line_6967 -arch x86_64 -l 0x7fff203c2f3d 0x10f5e2002 0x10f5e0bb6 0x10f5de1ad 0x10ee2ccb8 0x10ee309ab
0x10f5e2002
0x10f5e0bb6
0x10f5de1ad
0x10ee2ccb8
0x10ee309ab
what's mildly interesting is that atos
repeats all the given addresses except the first one. This suggests it's rather a very strange tool or I am using it in a wrong way. However, I tried my best with stack overflows and other internets and not much is readily available that would shed more light.
Sure, but why are you passing the -l
switch at all? My reading of the manpage is that you shouldn't do that and just give the addresses - but I don't actually have access to a Mac so sorry if I'm saying something stupid.
I think someone was using -l in some example somewhere. Without it, it also doesn't do anything interesting, but it repeats all the given:
atos -o ./vampire_dbg_michael-addr2line_6967 -arch x86_64 0x7fff203c2f3d 0x10f5e2002 0x10f5e0bb6 0x10f5de1ad 0x10ee2ccb8 0x10ee309ab
0x7fff203c2f3d
0x10f5e2002
0x10f5e0bb6
0x10f5de1ad
0x10ee2ccb8
0x10ee309ab
so maybe a progress? Would you like to try this together at some point, we me sharing a screen?
Would you like to try this together at some point, we me sharing a screen?
Sure - lmk when you have time, it'd be nice to have this working. If it doesn't we can brainstorm other options (including maintaining the status quo).
Met with @quickbeam123 and he is happy now. :-) Note to self: do actually test on StarExec before merging, could be humbling.
@quickbeam123 - I don't have an ideal development environment to test with right now, but this ought to work. If it's alright on your end, I'll put something through StarExec and we can merge.
Well, the cmake
version is not working on mac, at least. We get there, roughly, the following compile command for typical file
.../usr/bin/c++ -DCHECK_LEAKS=0 -DVDEBUG=1 -DVTIME_PROFILING=0 -DVZ3=1 -I/.../vampire -I/.../vampire/z3/src/api/c++ -I/.../vampire/z3/build/src/api -isystem /.../vampire/z3/src/api -g -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -mmacosx-version-min=12.6 -Wall -fno-threadsafe-statics -fno-rtti -fsized-deallocation -std=c++14 -MD -MT CMakeFiles/obj.dir/Inferences/BinaryResolution.cpp.o -MF CMakeFiles/obj.dir/Inferences/BinaryResolution.cpp.o.d -o CMakeFiles/obj.dir/Inferences/BinaryResolution.cpp.o -c /.../vampire/Inferences/BinaryResolution.cpp
Shouldn't it be mentioning -fno-pie
somewhere?
(This was on my home laptop, where the cmake
compile actually finished, is calling atos
, but not translating addresses.)
Update: cmake
claims that NOT CMAKE_CXX_LINK_PIE_SUPPORTED
. Is there a way to debug this?
We know there is a way to make this work from the Makefile solution. So should we simply enforce the Makefile options?
Update: cmake claims that NOT CMAKE_CXX_LINK_PIE_SUPPORTED. Is there a way to debug this?
Ahah - yes, we did this for CMAKE_IPO_SUPPORTED
. Patch incoming.
This should now complain, but not fail, with some (more) helpful error if PIE is not supported.
Generally CMakeLists.txt
could be nicer/simpler but that seems like a separate change.
So should we simply enforce the Makefile options?
But yes, if the above isn't informative we already special-case Clang/GCC for some options, we can do the same here.
Unfortunately, only getting:
...
-- PIE not supported, stack traces may be broken
-- PIE (CXX):
NO_PIE (CXX):
...
Also a bit frustrated that googling for things like cmake support PIE on mac
only finds what looks like irrelevant hits. Did nobody want this working in the past?
(Will also try it on the work laptop, where cmake didn't seem to work in the first place, i.e. first fixing that and then let's see what the PIE will print...)
By the traditional "private communications" channel, @quickbeam123 said he got this to work on his setup. @hzzv maybe you want to test this one last time to make sure I didn't break anything?
I'll upload a few crashy payloads to StarExec and see what we get, then merge this.
We get something like
0.00/0.01 % Running in auto input_syntax mode. Trying TPTP
0.00/0.01 23062 Aborted by signal SIGILL on /export/starexec/sandbox/benchmark/theBenchmark.p
0.00/0.01 % ------------------------------
0.00/0.01 % Version: Vampire 4.8 (commit 6e19426b1 on 2024-01-02 14:06:37 +0000)
0.00/0.01 % Termination reason: Unknown
0.00/0.01 % Termination phase: Saturation
0.00/0.01
0.00/0.01 % Memory used [KB]: 417
0.00/0.01 % Time elapsed: 0.005 s
0.00/0.01 % ------------------------------
0.00/0.01 ---- Runtime statistics ----
0.00/0.01 clauses created: 15
0.00/0.01 -----------------------------
0.02/0.05 % ------------------------------
0.02/0.05 Version : Vampire 4.8 (commit 6e19426b1 on 2024-01-02 14:06:37 +0000)
0.02/0.05 call stack: 0x409371 0xb4cf60 0xb4b864 0x40becf 0x40ab66 0x40970c 0x75b78d 0x75ba98 0x4e500b 0x7616fd 0x761489 0x7611dd 0x760b46 0x74ff6d 0x751f0c 0x76eadb 0x75b497 0x75b701 0x751f0c 0x6003bb 0x600093 0x649ecd 0x64f5d0 0x64a61f 0x64a83a 0x628bd0 0xb60270 0x46335e 0x418201
0.02/0.05 invoking addr2line(1) ...
0.05/0.13 addr2line: Dwarf Error: found dwarf version '5', this reader only handles version 2, 3 and 4 information.
0.05/0.13 _start
0.05/0.13 ??:?
0.05/0.13 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.13 __libc_start_main
0.05/0.13 ??:?
0.05/0.13 addr2line: Dwarf Error: found dwarf version '2049', this reader only handles version 2, 3 and 4 information.
0.05/0.13 __libc_start_call_main
0.05/0.13 libc-start.o:?
0.05/0.13 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.14 main
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.14 vampireMode()
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.14 doProving()
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Saturation::ProvingHelper::runVampireSaturation(Kernel::Problem&, Shell::Options const&)
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Saturation::ProvingHelper::runVampireSaturationImpl(Kernel::Problem&, Shell::Options const&)
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '771', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Kernel::MainLoop::run()
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Saturation::SaturationAlgorithm::runImpl()
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '586', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Saturation::SaturationAlgorithm::doOneAlgorithmStep()
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '1031', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Saturation::SaturationAlgorithm::doUnprocessedLoop()
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '6', this reader only handles version 2, 3 and 4 information.
0.05/0.14 Saturation::SaturationAlgorithm::addToPassive(Kernel::Clause*)
0.05/0.14 ??:?
0.05/0.14 addr2line: Dwarf Error: found dwarf version '1793', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Saturation::AWPassiveClauseContainer::add(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '8194', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Lib::SingleParamEvent<Kernel::Clause*>::fire(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '51976', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Lib::SingleParamEvent<Kernel::Clause*>::MethodSpecificHandlerStruct<Saturation::SaturationAlgorithm>::fire(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '21248', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Saturation::Otter::onPassiveAdded(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '4667', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Saturation::Otter::FakeContainer::add(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '373', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Lib::SingleParamEvent<Kernel::Clause*>::fire(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '67', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Lib::SingleParamEvent<Kernel::Clause*>::MethodSpecificHandlerStruct<Indexing::Index>::fire(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '2305', this reader only handles version 2, 3 and 4 information.
0.05/0.15 Indexing::Index::onAddedToContainer(Kernel::Clause*)
0.05/0.15 ??:?
0.05/0.15 addr2line: Dwarf Error: found dwarf version '22871', this reader only handles version 2, 3 and 4 information.
0.05/0.16 Indexing::DemodulationSubtermIndexImpl<false>::handleClause(Kernel::Clause*, bool)
0.05/0.16 ??:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '35601', this reader only handles version 2, 3 and 4 information.
0.05/0.16 Indexing::TermSubstitutionTree::insert(Kernel::TypedTermList, Kernel::Literal*, Kernel::Clause*)
0.05/0.16 ??:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '65535', this reader only handles version 2, 3 and 4 information.
0.05/0.16 Indexing::TermSubstitutionTree::handleTerm(Kernel::TypedTermList, Indexing::SubstitutionTree::LeafData, bool)
0.05/0.16 ??:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '0', this reader only handles version 2, 3 and 4 information.
0.05/0.16 void Indexing::SubstitutionTree::handle<Kernel::TypedTermList>(Kernel::TypedTermList const&, Indexing::SubstitutionTree::LeafData, bool)
0.05/0.16 ??:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '32273', this reader only handles version 2, 3 and 4 information.
0.05/0.16 Indexing::SubstitutionTree::insert(Lib::DHMap<unsigned int, Kernel::TermList, Lib::IdentityHash, Lib::DefaultHash>&, Indexing::SubstitutionTree::LeafData)
0.05/0.16 ??:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '34817', this reader only handles version 2, 3 and 4 information.
0.05/0.16 __restore_rt
0.05/0.16 libc_sigaction.o:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '4352', this reader only handles version 2, 3 and 4 information.
0.05/0.16 Lib::handleSignal(int)
0.05/0.16 ??:?
0.05/0.16 addr2line: Dwarf Error: found dwarf version '1036', this reader only handles version 2, 3 and 4 information.
0.05/0.16 Debug::Tracer::printStack(std::ostream&)
0.05/0.16 ??:?
0.05/0.16 EOF
when crashing on StarExec Miami with a debug build.
We should get file/line numbers, but because StarExec's addr2line
is relatively musty it doesn't support DWARF version 5, which is what my compiler defaults to. I could fix this by specifying -gdwarf4
if I wanted to. On StarExec we'd typically only have a release build without debug symbols anyhow.
I'd call this a win and move along with our lives. Thanks everyone!
Sorry for the delay, I was off from work until today. I can confirm it indeed works on my machine. Thank you again, Michael! :)
Something to chew on over the weekend. @hzzv brought to my attention that the stack trace mechanism could be better: it doesn't include file/line information where available, and it's kinda flaky with static builds for some reason I don't fully understand.
The current solution calls a system function to obtain a backtrace in the form of an array of pointers, then jumps through system hoops to map them to symbols. Instead, I propose that we shell out to
addr2line(1)
, which is very often installed on POSIXy machines. If we have it, great. If not, we're crashing anyway so we don't lose much, and we print out the addresses so that we can figure it out offline. This also rids us of a library dependency and some build magic. What do you think?