Open glaubitz opened 4 years ago
There also seem to be some non-debuginfo tests failing due to what looks like ABI-related miscompilations.
Maybe something is wrong with the gdb
you are using, because the logs are full of things like
/<<PKGBUILDDIR>>/build/sparc64-unknown-linux-gnu/test/debuginfo/borrowed-enum.gdb/borrowed-enum.debugger.script:12: Error in sourced command file:
Cannot access memory at address 0x20
This could also be a compiler bug (debuggers are very susceptible to GIGO) -- but I would start by verifying that gdb works on a tiny C program first.
GDB generally works fine on sparc64. We're using it extensively on sparc64 to debug C/C++ programs.
Any suggestions on how to test GDB properly?
GDB generally works fine on sparc64. We're using it extensively on sparc64 to debug C/C++ programs.
Any suggestions on how to test GDB properly?
Seeing if gdb was working properly is just the first step. Since it's working, there's no easy road any more, just looking at the debug info and/or debugging gdb to see where the bug lies.
Duplicates #62780, though this one has discussion so maybe the older one should be closed in favour of this.
As mentioned on IRC with @glaubitz I have a strange sense of deja vu here. I recall looking at this before and deciding that the tests just fundamentally wouldn't work on SPARC, as they broke on function calls, i.e. the SPARC CALL instruction, which has a delay slot, and that the delay slot was generally filled with part of the code that comes before the function call in the source code, so when probing with GDB it looks like some of the code hasn't yet executed (which it hasn't, but it will have by the time control is transferred to the callee). I think my conclusion was it could "easily" be fixed if all the magic "break on this line" comments were instead calls to a magic "please break" function, with GDB configured to break in the callee rather than the caller (plus a frame 1
in GDB). But that's still a bit of an undertaking. And I can't for the life of me discover any notes from past me about this anywhere... perhaps it's just back in IRC logs, or I never wrote anything up about it.
Managed to dig out IRC logs; this is what past me said:
2019-07-18 [10:58:45] <jrtc27> cbmuser: problem seems to be delay slots
2019-07-18 [10:59:01] <jrtc27> they break on zzz() calls (random function name just to act as a barrier)
2019-07-18 [10:59:20] <jrtc27> but the call instruction has a delay slot that does the final thing
2019-07-18 [10:59:33] <jrtc27> e.g. in lexical-scope-in-for-loop.rs, it stores the updated value to the stack
2019-07-18 [11:00:08] <jrtc27> and there are two stack slots in use (one for x in the first half of the loop, one for x in the second half)
2019-07-18 [11:00:21] <jrtc27> so the first time round the loop the stack slots are garbage
2019-07-18 [11:00:26] <jrtc27> second time, you get the first time's values
2019-07-18 [11:00:36] <jrtc27> third time you get the second time's values
2019-07-18 [11:01:13] <jrtc27> and outside the loop you get garbage again because the final store for the stack slot outside the loop is in the delay slot
2019-07-18 [11:02:04] <cbmuser> ah
2019-07-18 [11:02:25] <jrtc27> uh wait no actually the final time is fine, it managed to already do the store by then
2019-07-18 [11:02:30] <cbmuser> is sparc64 currently the only architecture in Rust that has delay slots?
2019-07-18 [11:02:35] <jrtc27> no, mips does
2019-07-18 [11:02:44] <jrtc27> as does powerpc
2019-07-18 [11:03:15] <jrtc27> hppa and sh4 do too I think but not relevant
2019-07-18 [11:03:25] <jrtc27> now I should check what gcc does
2019-07-18 [11:04:31] <jrtc27> there are two things I think would be fixes: 1. break in the delay slot not the call itself (though for me that actually feels like the wrong thing to do because if the source is a function call you really want to see the call instruction) 2. tweak debugging info so it says that the value is in a register
visited for wg-debugging triage
we discussed this for a bit. @michaelwoerister had some ideas for ways we might revise the debuginfo tests to be more robust to targets that are feature delay-slots (and are making use of them when compiling these tests).
I think our big first step is to find someone who can replicate these problems; I'm going to see if I can get access to a SPARC64 system (or emulator) that can replicate the issues. If not, then we'll have to look around for a member of the community who is willing to test some of these ideas out.
The GCC compile farm has various sparc64 machines available for anyone working on open-source projects (not just GCC) https://cfarm.tetaneutral.net/machines/list/
To follow up: I think the way we are setting breakpoints in our tests is pretty consistent and would probably be compatible with the approach of setting the breakpoint by function name and then looking one frame up. It would be good if that function was defined in a central place (so we don't have to copy-and-paste it into every test file) and then we'd have to update the test runner and documentation.
One of the few remaining issues on sparc64 on Linux are the debuginfo tests, they all fail which is why I assume we're missing some definitions here or there something regarding the debugging part is generally broken:
Any pointers where I should start looking?
Full log at: https://buildd.debian.org/status/fetch.php?pkg=rustc&arch=sparc64&ver=1.42.0%2Bdfsg1-1&stamp=1586554512&raw=0
CC @psumbera