thesofproject / sof

Sound Open Firmware
Other
528 stars 307 forks source link

[debug] [panic] panic_dump calls in case of oops #3316

Open IulianOlaru249 opened 4 years ago

IulianOlaru249 commented 4 years ago

With the current flow, in case of an oops on the DSP side, the exception function (xtensa/include/arch/init.h) is called as a handler. Then, it calls the panic_dump function, but with a NULL parameter where information about the filename and line number should be. Further on, the panic_dump function (src/debug/panic.c) should store the information in memory for the application processor to read. Encountering a NULL pointer, it does nothing. This is dangerous because, on the application processor side, there is only junk in this memory area and it will not really know when to stop reading (withought a \0). I am not really sure how, or if, this is handled for the Intel processors. If it is, some information would be helpful since I am working on integrating such a solution for the imx8 platforms.

IulianOlaru249 commented 4 years ago

@dbaluta let me know if i am framing the problem right.

dbaluta commented 4 years ago

@IulianOlaru249 yes, I think the problem is real.

Calling panic_dump with NULL as a second parameter is buggy because lets some memory uninitialized:

static inline void exception(void)
{
»       uintptr_t epc1;

»       __asm__ __volatile__("rsr %0, EPC1" : "=a" (epc1) : : "memory");
»       /* now save panic dump */
»       /* TODO: we could invoke a GDB stub here */
»       panic_dump(SOF_IPC_PANIC_EXCEPTION, NULL, &epc1);
}

Then on the Linux side we access this memory, with unpredictable results:

snd_sof_get_status:

»       dev_err(sdev->dev, "error: panic at %s:%d\n",
»       »       panic_info->filename, panic_info->linenum);

@lgirdwood @xiulipan what happens on Intel side when there's an exception on Firmware? For example division by zero.

xiulipan commented 4 years ago

@dbaluta I think we will raise an IPC IRQ and driver will caught that and try to dump the stack from memory window.

dbaluta commented 4 years ago

@xiulipan yes, but is the information printed on the Linux kernel side correct? I mean from the code flow, it looks like when an exception occcurs on the FW side, the panic_info structure misses filename and lineno.

This can have unpredictable behavior on the Linux kernel side when tries to print it, resulting in garbage being printed.

To demonstrate this, can you or anyone at Intel generate an exception on FW side and paste here the trace from the Linux kernel.

lgirdwood commented 4 years ago

@kv2019i fyi - kernel may need some checks here to avoid printing garbage.

paulstelian97 commented 4 years ago

Maybe simply initializing the memory in all cases (a single zero byte for filename and value -1 for line number) could also help?