Stack overflow on SVC entry still performs SVC, but from the wrong task

mkeeter commented 1 year ago

This is very mysterious.

If you build sidecar/rev-b.toml on commit 6465f006f3aacc7c51b6a0b8114438044d703d06, the control_plane_agent task has very little stack margin – so little, in fact, that you can trigger a stack overflow by talking to it:

$ faux-mgs --interface axf2 --discovery-addr [fe80::0c1d:dcff:fecf:2734]:11111 state

(this will time out)

After this failure, the system should be in an odd state: control_plane_agent will be faulted, and jefe will be waiting on the fault bit (1) but not its timer bit (2), which should never happen.

Normally, jefe should be notified by the kernel when a task faults, and will restart it. It's unclear why this isn't happening.

Adding an infinite loop to configurable_fault here (right before the return) shows a system that should return to jefe:

matt@niles ~ (sidecar-b) $ h registers
humility: attached to 0483:3754:002600184D4B500E20373831 via ST-Link V3
   R0 = 0x240004c0 <- kernel: HUBRIS_TASK_TABLE_SPACE+0x28
   R1 = 0x00000008
   R2 = 0x110b0009
   R3 = 0x00000008
   R4 = 0x00000000
   R5 = 0x0000ffff
   R6 = 0x00000001
   R7 = 0x00000000
   R8 = 0x00000000
   R9 = 0x00000000
  R10 = 0x24040710 <- jefe: JEFE_EXTERNAL_READY+0x0
  R11 = 0x00000001
  R12 = 0x24040488 <- jefe: 0x24040000+0x488
   SP = 0x24000398 <- kernel: 0x24000000+0x398
   LR = 0xffffffed
   PC = 0x08003bc2 <- kernel: configurable_fault+0x36
  PSR = 0x61000004 <- 0110_0001_0000_0000_0000_0000_0000_0100
                      |||| | ||         |       |           |
                      |||| | ||         |       |           + Exception = 0x4
                      |||| | ||         |       +------------ IC/IT = 0x0
                      |||| | ||         +-------------------- GE = 0x0
                      |||| | |+------------------------------ T = 1
                      |||| | +------------------------------- IC/IT = 0x0
                      |||| +--------------------------------- Q = 0
                      |||+----------------------------------- V = 0
                      ||+------------------------------------ C = 1
                      |+------------------------------------- Z = 1
                      +-------------------------------------- N = 0

  MSP = 0x24000398 <- kernel: 0x24000000+0x398
  PSP = 0x24040488 <- jefe: 0x24040000+0x488
  SPR = 0x05000000 <- 0000_0101_0000_0000_0000_0000_0000_0000
                            |||         |         |         |
                            |||         |         |         + PRIMASK = 0
                            |||         |         +---------- BASEPRI = 0x0
                            |||         +-------------------- FAULTMASK = 0
                            ||+------------------------------ CONTROL.nPRIV = 1
                            |+------------------------------- CONTROL.SPSEL = 0
                            +-------------------------------- CONTROL.FPCA = 1

FPSCR = 0x00000000

Here's the saved jefe state, which looks like the kernel sending a notification of 1:

matt@niles ~ (sidecar-b) $ h tasks -v jefe
humility: attached to 0483:3754:002600184D4B500E20373831 via ST-Link V3
system time = 29858
ID TASK                       GEN PRI STATE
 0 jefe                         0   0 RUNNING
   |
   +-----------> 0x24000498 Task {
                    save: SavedState {
                        r4: 0x0,
                        r5: 0xffff,
                        r6: 0x1,
                        r7: 0x0,
                        r8: 0x0,
                        r9: 0x0,
                        r10: 0x24040710,
                        r11: 0x1,
                        psp: 0x24040488,
                        exc_return: 0xffffffed,
                        s16: 0x0,
                        s17: 0x0,
                        s18: 0x0,
                        s19: 0x0,
                        s20: 0x0,
                        s21: 0x0,
                        s22: 0x0,
                        s23: 0x0,
                        s24: 0x0,
                        s25: 0x0,
                        s26: 0x0,
                        s27: 0x0,
                        s28: 0x0,
                        s29: 0x0,
                        s30: 0x0,
                        s31: 0xffffffff
                    },
                    priority: Priority(0x0),
                    state: Healthy(Runnable),
                    timer: TimerState {
                        deadline: Some(Timestamp(0x74cc)),
                        to_post: NotificationSet(0x2)
                    },
                    generation: 0x0,
                    notifications: 0x0,
                    descriptor: 0x8004d28 (&kern::descs::TaskDesc)
                }

Because LR is 0xFFFFFFED, it will return by looking at PSP. We, too, can look at PSP:

matt@niles ~ (sidecar-b) $ h readmem 0x24040488 -w
humility: attached to 0483:3754:002600184D4B500E20373831 via ST-Link V3
                    0        4       \/        c
0x24040480 |                   240405a0 00000008 |         ...$....
0x24040490 | 00000003 240405c8 240405a8 080083f1 | .......$...$....
0x240404a0 | 08009692 61000000 00000000 00000000 | .......a........
0x240404b0 | 00000000 00000000 00000000 00000000 | ................
0x240404c0 | 00000000 00000000 00000000 00000000 | ................
0x240404d0 | 00000000 00000000 00000000 00000000 | ................
0x240404e0 | 00000000 00000000 00000000 00000000 | ................
0x240404f0 | 00000000 00000017

This shows that it's about to return to 08009692, which is our sys_recv_stub function:

arm-none-eabi-addr2line -e jefe -i 0x08009692
/hubris/sys/userlib/src/lib.rs:387

However, it never seems to make it: adding a jefe-specific trap in userlib::sys_recv_stub, it never seems to be entered.

One theory is something about interrupt chaining going wrong: if there are pending interrupts when the final bx lr in configurable_fault is evaluated, then it will handle them instead, and that... somehow... eventually prevents jumping to jefe? A piece of evidence for this theory: in the infinite-loop-before-bx lr code, we see jefe listed as Runnable; however, if we remove that loop and let the bx lr execute, jefe ends up in Healthy(InRecv(None)).

One more observation: the net task will panic itself every 60 seconds if it doesn't see any traffic. When that happens, jefe is woken up and restarts control_plane_agent, as it should.

If I artificially lower the control_plane_agent stack size on Gimletlet, this does not reproduce; haven't tried yet on Gimlet.

cbiffle commented 1 year ago

Interrupt chaining was indeed involved ... but not in the way we expected! Turns out SVCall stays pended if you take a memory management fault while stacking the exception frame for SVCall. This produces reality-melting behavior. I've posted #1139 as a fix for this behavior, and in my testing on your branch it seems to work.

cbiffle commented 1 year ago

In discussions with @lzrd and @kc8apf we realized this could also happen if you triggered a usage or bus fault with too little stack -- it would produce a derived memmanage fault, but remain pending, so it'd be handled on return. At the time we return from the memmanage fault, we've already switched into the supervisor, so the fault would be blamed on the supervisor, which'd be... bad.

Fortunately the approach I'm using in #1139 can be adapted to cover both cases for little additional cost.

I'm going to see about adding a regression test to the kernel test suite for this, though it may be rather involved to do so.

oxidecomputer / hubris

Stack overflow on SVC entry still performs SVC, but from the wrong task #1134