sifive / freedom-metal

Bare Metal Compatibility Library for the Freedom Platform
Other
154 stars 47 forks source link

metal silently loops after illegal instruction #62

Open brucehoult opened 5 years ago

brucehoult commented 5 years ago

When I was debugging Krste's vector kernels it was nice that pk (I think) traps illegal instructions and prints the offending opcode and a dump of the x registers to stdout, and then nicely exits.

Once Palmer converted everything to use metal this no longer happened. Instead it simply loops.

This was a problem because it meant that when someone had not correctly updated their toolchain they just got a silent hang.

What happens with a standard elf build with newlib, run with spike pk:

bruce@nuc:~/riscv/bengal-team/software/spike64-kr636_pipe$ spike pk kr636
bbl loader
z  0000000000000000 ra 000000000001021c sp 000000007f7e9ae0 gp 000000000001d5e8
tp 0000000000000000 t0 0000000000010592 t1 000000000001c110 t2 0000000000000000
s0 000000000001bdf8 s1 000000000001e000 a0 000000000000000a a1 000000000001bdf8
a2 000000000001bb28 a3 000000000001d840 a4 0000000000000004 a5 0000000000000048
a6 0000000000000048 a7 0000000000000090 s2 000000000001d000 s3 000000000001c000
s4 0000000000000000 s5 0000000000000000 s6 0000000000000000 s7 0000000000000000
s8 0000000000000000 s9 0000000000000000 sA 0000000000000000 sB 0000000000000000
t3 0000000000000001 t4 000000000001d840 t5 000000000001bb58 t6 000000000001be04
pc 0000000000010392 va 000000005dad0bd7 insn       5dad0bd7 sr 8000000200046020
An illegal instruction was executed!
bruce@nuc:~/riscv/bengal-team/software/spike64-kr636_pipe$ 

And with metal:

bruce@nuc:~/riscv/bengal-team/software/spike64-kr636_pipe$ spike kr636

or with -l, thousands and thousands of lines of output and then...

bruce@nuc:~/riscv/bengal-team/software/spike64-kr636_pipe$ spike -l kr636
[...]
core   0: 0x000000008000026a (0x5dad0bd7) vmerge.vv v23, v26, v26, v0.t
core   0: exception trap_illegal_instruction, epc 0x000000008000026a
core   0:           tval 0x0000000000000000
core   0: 0x0000000080009910 (0x0000a001) c.j     pc + 0

When the execution of spike is buried deep within scripts and makefiles it is neither obvious what has happened, nor obvious how to add the -l flag (which you don't otherwise want!) or even that you need it.

(this is a legal RVV instruction, but not yet implemented in spike, which therefore raises illegal instruction)

palmer-dabbelt commented 5 years ago

One of the design goals of metal is to have as thin of an interface as possible, which results in us not really having any trap handling set up by default -- essentially just an infinite loop. I wouldn't be opposed to doing something simple like defaulting to a trap handler that says something along the lines of "you didn't register a trap handler, but you've taken a trap anyway" to aid debugging.

Maybe the right thing to do here is to have a debug configuration option for metal, which will do things like printing prettier error messages at the cost of some code size (ie, always linking in printf)?

brucehoult commented 5 years ago

Some code size, yes, but printf() isn't necessary to print a few string literals and hex numbers. Even "Trap mepc=0xnnnnnnnnnnnnnnnn mcause=0xnnnnnnnnnnnnnnnn mtval=0xnnnnnnnnnnnnnnnn\n" would be helpful. That's probably under 100 bytes.

bsousi5 commented 5 years ago

Bruce,

One can set up the traps vector prior to their own applications. Metal already has APIs that one can do this, if one need to print out mcause, mtval.

On Mar 10, 2019, at 11:25 PM, Bruce Hoult notifications@github.com wrote:

Some code size, yes, but printf() isn't necessary to print a few string literals and hex numbers. Even "Trap mepc=0xnnnnnnnnnnnnnnnn mcause=0xnnnnnnnnnnnnnnnn mtval=0xnnnnnnnnnnnnnnnn\n" would be helpful.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sifive/freedom-metal/issues/62#issuecomment-471418644, or mute the thread https://github.com/notifications/unsubscribe-auth/AiDa38IsR_pglVp1PzFarUeH8Ps7PDFoks5vVfbGgaJpZM4biXPX.

brucehoult commented 5 years ago

I'm sure one can. The issue is that programs were written to be compiled with default flags with the elf compiler and run using "spike pk", which results in sensible defaults and diagnostics, and then Palmer set up a build system that built them as metal apps, and then they suddenly mysteriously had no diagnostics. Also in the case of saxpy, it crashed with an "unknown syscall" as soon as -- it turned out -- a non-trivial printf() was executed (e.g. to print "%f == %f\n") because the metal stack size was not big enough.

These two things made debugging harder and rather mysterious for people not familiar with metal such as Megan and me.

Things should be set up so that casual users can write simple programs that Just Work. Having a very basic default trap handler automatically installed, and a default stack big enough to run printf() seem reasonable to me. If sophisticated users want to substitute their own trap handler or reduce the stack size that's up to them.