thesofproject / sof

Sound Open Firmware
Other
532 stars 308 forks source link

[BUG] [APL] FW core dump has empty oops data if FW built with GCC #1346

Open kv2019i opened 5 years ago

kv2019i commented 5 years ago

Describe the bug In case DSP hits an exception, it dumps oops data for the driver to read out.

What actually happens is that I get a dump file that is partially ok. I can find "struct sof_ipc_panic_info" correctly filled. E.g. I can find SOF_IPC_PANIC_MAGIC and the rest of the struct seems ok. Start of the dump however is just zeroes and there doesn't seem to be any valid values for "struct sof_ipc_dsp_oops_xtensa". Stack dump seems ok and I can find correct functions my manually looking up symbols from the stack dump, but the coredumper python scripts cannot make sense out of this dump as many key fields are just zeroes.

To Reproduce Cause FW to hit oops. I did this by adding following code to volume.c:volume_copy()

»       »       panic(1234);
»       »       while(1) {}

Expected behavior I can extract the oops file by doing:

scp root@dut:/sys/kernel/debug/sof/exception oops.bin

And feed data to coredumper tool.

Impact If DSP oops cannot be succesfully saved, debugging hard-to-reproduce bugs is severely impacted.

Environment 1) Branch name and commit hash of 3 repositories: sof (firmware), linux (kernel driver) and soft (tools & topology). linux sofdev 2e945691f2c09ddf533e7bab6a875d99ccac7c46 sof master e14ab7088bf4673f43396f02490f717f69de0de9

2) Name of the topology file n/a

3) Name of the platform(s) on which the bug is observed. APL UP2

4) Reproducibility Rate. If you can only reproduce it randomly, it’s useful to report how many times the bug has been reproduced vs. the number of attempts it’s taken to reproduce the bug. 100%

Screenshots or console output Two example dumps attached.

oopses-20190429.zip


Highlights from the comments below:

problem is definitely in how the dump routine handles WINDOWBASE updates. If ROTW is called even once (like happens with 1 iteration of store_register_loop), the result is an invalid dump. Code looks current and I fail to see how a single ROTW can have such impact (there are only a few ops on the core after this

Got basic gdb working at least do a degree within QEMU and it seems ROTW causes another exception and we end up in DoubleExceptionVector handler. But that's probably just a symptom, the same code works when compiled on XT-CC. I

I now got an OK exception dump (for another bug) on WHL (cnl image), built with GCC, so at least this is not happening in all cases. Rootcause still unknown.

Fwiw, the ABI between GCC and XCC is slightly different wrt calling convention and registers windows hence there are some incompatibilities with some of the dump data.

kv2019i commented 5 years ago

@mmaka1 @abonislawski it seems part of the dump is correct (panic header and stack dump), so basic procedure I follow seems to be ok. Can you take a look at the exception dumps and see if yo notice anything obviously wrong that could explain this?

Btw, I compiled now with GCC, I'll try with Xtensa toolchain as well and see if that makes a difference.

kv2019i commented 5 years ago

Btw, I compiled now with GCC, I'll try with Xtensa toolchain as well and see if that makes a difference.

Ok, bingo! When I rebuilt FW with XT toolchain, ta-daa, dump copied verbatim from /sys/kernel/debug/sof/exception looks perfectly fine. Exactly the same dump compiled with GCC results in the weird partially filled oops file I copied as attachment to this bug. Relabelled the subject, so this applies only to gcc.

lgirdwood commented 5 years ago

@kv2019i Can you attach both exception binaries, this will help root cause the issue. It may be the exception window size is different between xcc and gcc (and we may have to pad if using gcc).

kv2019i commented 5 years ago

@lrgirdwo wrote:

@kv2019i Can you attach both exception binaries, this will help root cause the issue. It may be the exception window size is different between xcc and gcc (and we may have to pad if using gcc).

I was looking at that but it seems there is really a lot of data missing. E.g. I can' t find a byte sequence that would look like a valid "struct sof_ipc_dsp_oops_arch_hdr" anywhere in the GCC dump. I'll attach the binary files and attach a hexdump diff below. The oops data seems completely zeroed out and I can't find e.g. the oops config codes anywhere in the GCC-built FW oops, so it seems it's more than an offset.

The diff looks a bit hard to read in mark down, so I'll attach the full binaries plus my hexdump-diff as well. dsp-oops-20190430.zip

$ diff -y  exception6-gcc.txt exception7-xtcc.txt
--cut--
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000000  01 00 00 00 80 01 00 00  fe fb f3 c3 3c 81 44 15  |
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000010  40 00 00 00 7c 01 00 00  30 fc 07 be 04 00 00 00  |
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000020  00 00 00 00 22 0d 06 00  b9 bd 02 be 2a 83 02 be  |
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000030  2a 83 02 be 00 00 00 00  2a 83 02 be 00 00 00 00  |
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000040  00 00 00 00 20 03 06 00  20 03 06 00 00 00 00 00  |
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000050  20 03 06 00 00 00 00 00  00 00 00 00 00 00 00 00  |
00000060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000060  00 00 00 00 2a 02 00 00  00 00 00 00 00 00 00 00  |
00000070  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000070  09 00 00 00 a8 02 00 00  00 00 00 00 ed 82 02 be  |
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000080  20 fc 07 be 1c a8 00 be  22 0d 06 00 18 a8 00 be  |
00000090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000090  7c 01 00 00 14 a8 00 be  00 00 00 00 c9 95 02 be  |
000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000000a0  f0 fb 07 be 00 80 00 be  00 00 00 00 28 10 05 9e  |
000000b0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000000b0  ac a9 00 be 4b 00 00 00  6e 74 2e 63 96 8b 02 be  |
000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000000c0  e0 fb 07 be 40 80 00 be  01 00 00 00 00 00 00 00  |
000000d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000000d0  a0 0d 00 be 00 00 00 be  a0 3d 01 20 42 51 02 be  |
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000000e0  b0 fb 07 be 00 80 00 be  a0 0d 00 be 00 00 00 00  |
000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000000f0  14 80 00 be aa 52 e8 00  a0 3d 01 20 f0 fc 07 be  |
00000100  c0 fe 07 be 00 8a 06 be  00 00 00 00 01 00 00 00  | | 00000100  c0 fc 07 be 40 13 05 be  00 00 00 00 f0 13 05 be  |
00000110  00 00 00 00 00 4b 00 00  00 00 00 00 ae 14 01 be  | | 00000110  40 13 05 be 54 13 05 be  74 13 05 be 39 61 02 be  |
00000120  90 fe 07 be 00 8a 06 be  30 00 00 00 80 01 00 00  | | 00000120  a0 fc 07 be c0 13 05 be  00 00 00 00 00 00 00 00  |
00000130  80 68 06 be 00 68 06 be  c0 00 00 00 fa 33 01 be  | | 00000130  02 00 00 00 f0 13 05 be  00 00 00 00 e0 75 02 be  |
00000140  f0 fc 07 be 09 d0 ea 0d  34 95 02 be 4b 00 00 00  | | 00000140  50 fc 07 be 09 d0 ea 0d  d4 1d 04 be 4b 00 00 00  |
00000150  bc 71 e7 00 00 f8 24 01  00 00 00 00 5c 33 02 be  | | 00000150  58 fc 07 be 00 ba db 00  04 00 00 00 b3 83 02 be  |
00000160  d0 fc 07 be 09 d0 ea 0d  00 a8 00 be d0 fc 07 be  | | 00000160  30 fc 07 be 09 d0 ea 0d  30 fc 07 be c8 03 00 00  |
00000170  22 05 06 00 80 01 00 00  2f 68 6f 6d 00 00 00 00  | | 00000170  22 0d 06 00 03 00 00 00  2f 68 6f 6d 00 00 00 00  |
00000180  ff ff ff 00 15 7b 08 18  2e 2e 2e 2f 77 6f 72 6b  | | 00000180  00 00 00 00 f5 4e 00 00  2e 2e 2e 72 2f 66 33 31  |
00000190  2f 73 6f 66 2e 67 69 74  2f 73 72 63 2f 6c 69 62  | | 00000190  2d 73 6f 66 2f 73 6f 66  2f 73 72 63 2f 6c 69 62  |
000001a0  2f 61 67 65 6e 74 2e 63  4b 00 00 00 00 00 00 00  | | 000001a0  2f 61 67 65 6e 74 2e 63  4b 00 00 00 ff ff ff 00  |
000001b0  10 fd 07 be 00 00 00 00  70 18 01 be 00 00 00 00  | | 000001b0  f7 7c 08 18 00 00 00 00  a0 3d 01 20 18 00 00 00  |
000001c0  00 00 00 00 e8 03 00 00  00 00 00 00 ff ff ff 00  | | 000001c0  00 00 02 90 00 8d 06 be  70 2a 00 00 00 00 00 00  |
000001d0  15 7b 08 18 2e 2e 2e 2f  77 6f 72 6b 2f 73 6f 66  | | 000001d0  f5 4e 00 00 2e 2e 2e 72  2f 66 33 31 2d 73 6f 66  |
000001e0  2e 67 69 74 2f 73 72 63  2f 6c 69 62 2f 61 67 65  | | 000001e0  2f 73 6f 66 2f 73 72 63  2f 6c 69 62 2f 61 67 65  |
000001f0  6e 74 2e 63 4b 00 00 00  00 00 00 00 b8 ad 01 be  | | 000001f0  6e 74 2e 63 4b 00 00 00  00 00 00 00 00 00 00 00  |
00000200  50 fd 07 be b4 99 01 be  d0 ad 02 be 4a ae 01 be  | | 00000200  77 51 3d 00 00 00 00 00  d0 92 00 20 1d 64 02 be  |
00000210  60 fd 07 be 40 13 05 be  f0 13 05 be 04 5b 02 fe  | | 00000210  c0 fc 07 be 40 13 05 be  f0 fc 07 be 00 20 00 00  |
00000220  90 fd 07 be 24 a6 02 be  1c fe 05 be c4 58 02 fe  | | 00000220  02 00 00 00 a6 7c 08 18  00 00 00 00 72 cc 01 be  |
00000230  90 fd 07 be dc a2 02 be  01 00 00 00 74 13 05 be  | | 00000230  f0 fc 07 be 40 13 05 be  54 13 05 be 00 00 00 00  |
00000240  22 00 06 00 10 00 00 00  01 00 00 00 54 13 05 be  | | 00000240  00 00 00 00 20 a1 07 00  30 41 04 be f0 13 05 be  |
00000250  54 c0 02 be 22 00 06 00  40 13 05 be 15 33 02 7e  | | 00000250  40 13 05 be 54 13 05 be  74 13 05 be e5 ad 02 fe  |
00000260  c0 fd 07 be 80 01 00 00  2f 68 6f 6d 04 16 00 00  | | 00000260  20 fd 07 be a0 35 04 be  1c f0 05 be 22 00 06 00  |
00000270  00 fe 05 be 00 8d 06 be  00 88 06 be 1c f0 05 be  | | 00000270  ff ff ff ff ff ff ff ff  00 10 06 9e e8 63 02 be  |
00000280  16 00 00 00 00 00 00 00  00 f0 05 be 5c 33 02 be  | | 00000280  16 00 00 00 f0 ff ff ff  22 00 06 00 2a 83 02 7e  |
00000290  20 fe 07 be af 3a 00 00  88 11 00 00 95 e6 01 be  | | 00000290  50 fd 07 be 03 00 00 00  2f 68 6f 6d 00 16 00 00  |
000002a0  00 fe 07 be 08 00 00 00  01 00 00 00 d2 04 00 80  | | 000002a0  16 00 00 00 00 00 00 00  84 36 00 20 c8 35 04 be  |
000002b0  44 17 3e 00 00 08 04 00  20 00 06 00 ff ff ff ff  | | 000002b0  00 00 00 00 1c 00 00 00  01 00 00 00 b3 83 02 be  |
000002c0  00 00 00 00 d9 56 02 be  df 56 02 be fc 32 02 be  | | 000002c0  b0 fd 07 be c0 e2 05 be  7c a9 00 be ff ff ff 00  |
000002d0  10 fe 07 be 44 e0 00 be  01 00 00 00 04 00 00 00  | | 000002d0  91 5c 3d 00 08 00 00 00  40 fa 00 20 d2 04 00 00  |
000002e0  15 00 00 00 2a 80 00 00  00 00 00 00 fa 33 01 be  | | 000002e0  90 fd 07 be 00 00 00 00  20 03 06 00 ff ff ff 00  |
000002f0  40 fe 07 be d2 04 00 00  b8 94 02 be ff ff ff ff  | | 000002f0  00 00 00 00 f9 ab 02 be  ff ab 02 be 12 83 02 be  |
00000300  ff ff ff ff fe ff ff ff  0c 14 01 be ae 14 01 be  | | 00000300  a0 fd 07 be 44 e0 00 be  01 00 00 00 04 00 00 00  |
00000310  90 fe 07 be 00 8a 06 be  30 00 00 00 ff ff ff 00  | | 00000310  d2 04 00 80 00 0e 00 00  00 08 04 00 95 41 01 be  |
00000320  93 26 21 17 2e 2e 2e 72  6b 2f 73 6f 66 2e 67 69  | | 00000320  d0 fd 07 be d2 04 00 00  40 1d 04 be ff ff ff 00  |
00000330  74 2f 73 72 63 2f 61 75  64 69 6f 2f 76 6f 6c 75  | | 00000330  91 6c 20 17 00 00 00 00  84 36 00 20 95 1f 01 be  |
00000340  6d 65 2e 63 08 02 00 00  90 2d 00 20 08 02 00 00  | | 00000340  20 fe 07 be 00 8a 06 be  80 68 06 be 70 fe 07 be  |
00000350  20 4e 00 00 80 01 00 00  00 00 00 00 b1 1a 01 be  | | 00000350  00 00 00 00 2e 2e 2e 33  31 2d 73 6f 66 2f 73 6f  |
00000360  c0 fe 07 be 00 8a 06 be  a0 32 01 be 00 8b 06 be  | | 00000360  66 2f 73 72 63 2f 61 75  64 69 6f 2f 76 6f 6c 75  |
00000370  30 c0 02 be 18 00 00 00  2c 1a 01 be 80 01 00 00  | | 00000370  6d 65 2e 63 08 02 00 00  80 68 06 be 08 02 00 00  |
00000380  80 68 06 be 00 68 06 be  c0 00 00 00 60 04 01 be  | | 00000380  d8 fd 07 be 98 3a 00 00  00 00 00 00 00 20 01 be  |
00000390  e0 fe 07 be 00 8d 06 be  00 8a 06 be 01 00 00 00  | | 00000390  50 fe 07 be 00 8a 06 be  70 fe 07 be 60 00 00 00  |
000003a0  00 00 00 00 00 4b 00 00  00 00 00 00 a2 53 02 fe  | | 000003a0  30 00 00 00 00 00 00 00  01 00 00 00 00 68 06 be  |
000003b0  20 ff 07 be 40 8d 06 be  2c 1a 01 be 00 8a 06 be  | | 000003b0  00 8b 06 be 08 00 00 00  04 00 00 00 fd 22 01 be  |
000003c0  22 00 06 00 54 13 05 be  00 8d 06 be 40 12 05 be  | | 000003c0  70 fe 07 be 00 8d 06 be  00 8a 06 be 01 00 00 00  |
000003d0  00 00 00 00 f2 25 21 17  00 00 00 00 01 00 00 00  | | 000003d0  44 12 05 be b5 4a 00 00  00 00 00 00 78 05 01 be  |
000003e0  d8 01 01 00 ff ff ff ff  25 0b 06 00 47 4b 02 7e  | | 000003e0  b0 fe 07 be 00 8d 06 be  00 00 00 00 00 8a 06 be  |
000003f0  40 ff 07 be f7 ff ff ff  00 00 00 00 80 12 05 be  | | 000003f0  a0 fe 07 be 1c 10 06 9e  00 8d 06 be 00 00 00 00  |
00000400  84 12 05 be 84 12 05 be  84 12 05 be 56 f2 01 be  | | 00000400  34 19 02 be 40 c3 00 00  00 00 00 00 01 00 00 00  |
00000410  a0 ff 07 be 0d 00 00 00  47 4b 02 fe 47 4b 02 be  | | 00000410  00 00 00 00 44 12 05 be  60 8d 06 be c2 a8 02 fe  |
00000420  30 07 06 00 18 00 00 00  ff ff ff ff 40 c0 02 be  | | 00000420  d0 fe 07 be 80 12 05 be  68 8d 06 be 02 00 00 00  |
00000430  47 00 00 00 0a d0 ea 0d  60 95 02 be 02 00 00 00  | | 00000430  00 00 00 00 23 00 06 00  e8 22 01 be 44 a2 02 7e  |
00000440  00 00 00 00 d9 56 02 be  df 56 02 be 00 00 00 00  | | 00000440  00 ff 07 be f7 ff ff ff  d8 3b 05 9e 84 12 05 be  |
00000450  80 ff 07 be c0 13 05 be  24 96 02 be 00 10 05 9e  | | 00000450  11 00 00 00 22 0f 06 00  04 13 05 be 20 00 06 00  |
00000460  ec 73 02 be 44 8a 02 be  0d 00 00 00 86 f2 01 be  | | 00000460  84 12 05 be 01 00 00 00  40 8d 06 be 50 a2 02 be  |
00000470  c0 ff 07 be 40 c0 02 be  08 e0 00 be 0a d0 ea 0d  | | 00000470  60 ff 07 be 00 00 00 00  44 a2 02 fe 44 a2 02 be  |
00000480  60 95 02 be 02 00 00 00  00 00 00 00 12 09 01 7e  | | 00000480  30 07 06 00 18 00 00 00  ff ff ff ff 00 00 00 00  |
00000490  e0 ff 07 be 00 00 00 00  40 96 02 be 08 00 00 00  | | 00000490  00 00 00 00 20 07 06 00  18 00 00 00 cc 38 05 9e  |
000004a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000004a0  00 00 00 00 f9 ab 02 be  ff ab 02 be 2e a2 02 be  |
000004b0  00 00 08 be 00 00 00 01  20 00 04 00 44 96 02 be  | | 000004b0  40 ff 07 be 00 00 00 00  44 12 05 be 44 12 05 be  |
000004c0  ff ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00  | | 000004c0  00 0e 00 00 80 16 05 9e  00 00 00 00 bd a2 02 be  |
000004d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000004d0  80 ff 07 be 00 00 00 00  b0 3f 04 be 00 00 00 00  |
000004e0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000004e0  20 05 06 00 7c 50 04 be  01 00 00 00 ab 21 02 be  |
000004f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 000004f0  a0 ff 07 be 48 50 04 be  01 00 00 00 08 00 00 00  |
00000500  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000500  00 32 00 00 80 41 00 00  80 00 00 00 38 22 02 be  |
00000510  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000510  c0 ff 07 be 48 50 04 be  00 de 00 be 08 00 00 00  |
00000520  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000520  00 40 00 00 00 00 00 00  00 00 00 00 4e 0b 01 7e  |
00000530  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000530  e0 ff 07 be 01 00 00 00  c0 1e 04 be 08 00 00 00  |
00000540  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |   00000540  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |
00000550  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000550  00 00 08 be 00 00 00 01  20 00 04 00 c4 1e 04 be  |
00000560  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | | 00000560  ff ff ff ff 00 00 00 00  00 00 00 00 00 00 00 00  |
00000570  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |   00000570  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |
00000580  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |   00000580  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |
00000590  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |   00000590  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |
000005a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |   000005a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |
[cut...0000005b0 ... 0000007f0 all zeroes in both files]
000007f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |   000007f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |
00000800                                                        00000800
--cut--
lgirdwood commented 5 years ago

@kv2019i the other thing I can think of is calling convention between C and assembler. Can you try again and comment out arch_dump_regs_a below and see if the header is correct.

static inline void fill_core_dump(struct sof_ipc_dsp_oops_xtensa *oops,
                  uint32_t ps, uintptr_t stack_ptr,
                  uintptr_t *epc1)
{
    oops->arch_hdr.arch = ARCHITECTURE_ID;
    oops->arch_hdr.totalsize = sizeof(*oops);
#if XCHAL_HW_CONFIGID_RELIABLE
    oops->plat_hdr.configidhi = XCHAL_HW_CONFIGID0;
    oops->plat_hdr.configidlo = XCHAL_HW_CONFIGID1;
#else
    oops->plat_hdr.configidhi = 0;
    oops->plat_hdr.configidlo = 0;
#endif
    oops->plat_hdr.numaregs = XCHAL_NUM_AREGS;
    oops->plat_hdr.stackoffset = ((void *)&oops->stack) - (void *)oops;
    oops->plat_hdr.stackptr = stack_ptr;

    oops->epc1 = *epc1;

    arch_dump_regs_a((void *)&oops->exccause, ps);
}

and also check that a2 is used by fill_core_dump() for oops.

kv2019i commented 5 years ago

@lgirdwood wrote:

@kv2019i the other thing I can think of is calling convention between C and assembler. Can you try again and comment out arch_dump_regs_a below and see if the header is correct. [...] and also check that a2 is used by fill_core_dump() for oops.

Calling code seems to be ok, but something in arch_dump_regs_a is definitely causing this. Currently I've isolated the problem to "store_register_loop:" segment of exc-dump.S. If I jump over the register store loop, the dump looks ok -- both header and the special registers saved after the register store loop look ok. The code does look correct, so not yet figured out why this is wrong. The callstack seems to slightly differ between GCC and XT builds, so maybe this triggers some issue w.r.t register windowing. Continuing to debug this.

lgirdwood commented 5 years ago

@kv2019i I assume store_register_loops works if it only does 1 iteration ? Maybe worth running it through qemu and trace the registers.

kv2019i commented 5 years ago

@lgirdwood wrote:

@kv2019i I assume store_register_loops works if it only does 1 iteration ? Maybe worth running it through qemu and trace the registers.

The strange thing it doesn't. Even if I force iterations to one, the oops get completely messed up. :P The disassembly looks completely same as well, so pretty weird. I'll have to try in QEMU as well -- I was actually assuming coredump works under QEMU, but in fact I have not tested that myself.

Update: problem is definitely in how the dump routine handles WINDOWBASE updates. If ROTW is called even once (like happens with 1 iteration of store_register_loop), the result is an invalid dump. Code looks current and I fail to see how a single ROTW can have such impact (there are only a few ops on the core after this). 'll try to reproduce in QEMU to understand what happens with ROTW.

lgirdwood commented 5 years ago

@kv2019i debuging another issue and here are some qemu dumps

Memory dump of hexdump -C /dev/shm/qemu-bridge-hp-sram-mem | less

0000a800  01 00 00 00 80 01 00 00  fe fb f3 c3 3c 81 44 15  |............<.D.|
0000a810  40 00 00 00 7c 01 00 00  b0 fd 07 be 0f 00 00 00  |@...|...........|
0000a820  04 00 00 00 25 00 06 00  46 37 02 be 74 4b 02 be  |....%...F7..tK..|
0000a830  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0000a840  00 00 00 00 20 00 06 00  00 00 00 00 00 00 00 00  |.... ...........|
0000a850  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0000a870  0b 00 00 00 9a 0a 00 00  00 00 00 00 0e 53 02 be  |.............S..|
0000a880  80 fd 07 be 1c a8 00 be  25 00 06 00 2c 00 00 00  |........%...,...|
0000a890  48 02 00 00 40 f6 02 be  20 00 06 00 84 2b 02 be  |H...@... ....+..|
0000a8a0  50 fd 07 be ac a9 00 be  f8 ff 07 be 00 00 00 00  |P...............|
0000a8b0  f4 ab 00 be 00 00 00 00  00 00 00 00 74 4b 02 be  |............tK..|
0000a8c0  f0 fe 07 be 00 00 40 00  00 00 00 00 08 00 00 00  |......@.........|
0000a8d0  d8 01 00 00 ff ff ff ff  20 00 06 00 64 72 02 fe  |........ ...dr..|
0000a8e0  80 fe 07 be 20 ca 02 be  16 00 00 00 1c f0 05 be  |.... ...........|
0000a8f0  17 00 00 00 e8 03 00 00  01 00 00 00 34 c4 01 be  |............4...|
0000a900  50 fe 07 be 00 00 00 00  25 00 05 c0 25 00 05 40  |P.......%...%..@|
0000a910  f0 fd 07 be 00 00 00 00  04 00 00 00 dc 35 02 be  |.............5..|
0000a920  c7 74 0f 3c c7 0a 10 3c  00 00 00 00 40 13 05 be  |.t.<...<....@...|
0000a930  01 00 00 00 02 00 00 00  00 00 00 00 10 6c 02 fe  |.............l..|
0000a940  c0 fd 07 be f0 fd 07 be  0f 00 00 00 10 00 00 00  |................|
0000a950  50 f4 02 be 00 02 00 00  00 02 00 00 78 11 01 be  |P...........x...|
0000a960  90 fd 07 be 00 a8 00 be  46 37 02 be 06 d0 ea 8d  |........F7......|
0000a970  c0 fd 07 be 06 d0 ea 0d  25 00 06 00 00 00 00 00  |........%.......|
0000a980  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0000a9b0  00 00 00 00 00 00 00 00  00 00 00 00 46 37 02 be  |............F7..|
0000a9c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

Qemu registers

(qemu) info registers 
PC=be025337

        LBEG=be027079         LEND=be02707f       LCOUNT=00000000          SAR=0000000a
          BR=00000000    SCOMPARE1=00000000      PREFCTL=00000022  WINDOW_BASE=00000009
WINDOW_START=0000029a IBREAKENABLE=00000000       MEMCTL=00000000      ATOMCTL=00000015
    IBREAKA0=00000000     IBREAKA1=00000000     DBREAKA0=00000000     DBREAKA1=00000000
    DBREAKC0=00000000     DBREAKC1=00000000    CONFIGID0=c3f3fbfe         EPC1=be0258b3
        EPC2=be024b74         EPC3=00000000         EPC4=00000000         EPC5=00000000
        EPC6=00000000         EPC7=00000000         DEPC=00000000         EPS2=00060020
        EPS3=00000000         EPS4=00000000         EPS5=00000000         EPS6=00000000
        EPS7=00000000    CONFIGID1=1544813c     EXCSAVE1=00000000     EXCSAVE2=be0271e8
    EXCSAVE3=be0272a8     EXCSAVE4=be027368     EXCSAVE5=be027428     EXCSAVE6=00000000
    EXCSAVE7=00000000       INTSET=00000000     INTCLEAR=00000000    INTENABLE=000001d8
          PS=00060025      VECBASE=be010000     EXCCAUSE=0000000f   DEBUGCAUSE=00000000
      CCOUNT=004ceb48         PRID=00000000       ICOUNT=00000000  ICOUNTLEVEL=00000000
    EXCVADDR=00000004    CCOMPARE0=00000000    CCOMPARE1=00000000    CCOMPARE2=00000000

THREADPTR=00000000 FCR=00000000 FSR=00000000 

 A00=be011178  A01=be07fd90  A02=00001188  A03=00040800
 A04=00000000  A05=be07fdc0  A06=0dead006  A07=00060025
 A08=be025324  A09=be07fd80  A10=be00e044  A11=00000001
 A12=00000004  A13=00000015  A14=00000a9a  A15=00060020

AR00=00000008 AR01=000001d8 AR02=ffffffff AR03=00060020
AR04=fe027264 AR05=be07fe80 AR06=be02ca20 AR07=00000016
AR08=be05f01c AR09=00000017 AR10=000003e8 AR11=00000001
AR12=be01c434 AR13=be07fe50 AR14=00000000 AR15=c0050025
AR16=40050025 AR17=be07fdf0 AR18=00000000 AR19=00000004
AR20=be0235dc AR21=3c0f74c7 AR22=3c100ac7 AR23=00000000
AR24=be051340 AR25=00000001 AR26=00000002 AR27=00000000
AR28=fe026c10 AR29=be07fdc0 AR30=be07fdf0 AR31=0000000f
AR32=00000010 AR33=be02f450 AR34=00000200 AR35=00000200
AR36=be011178 AR37=be07fd90 AR38=be00a800 AR39=be00a97c
AR40=00000000 AR41=be07fdc0 AR42=0dead006 AR43=00060025
AR44=be025324 AR45=be07fd80 AR46=be00e044 AR47=00000001
AR48=00000004 AR49=00000015 AR50=00000a9a AR51=00060020
AR52=be022b84 AR53=be07fd50 AR54=be00a9ac AR55=be07fff8
AR56=00000000 AR57=be00abf4 AR58=00000000 AR59=00000000
AR60=be024b74 AR61=be07fef0 AR62=00400000 AR63=00000000

F00=00000000 (+0.00000000e+00) F01=00000000 (+0.00000000e+00)
F02=00000000 (+0.00000000e+00) F03=00000000 (+0.00000000e+00)
F04=00000000 (+0.00000000e+00) F05=00000000 (+0.00000000e+00)
F06=00000000 (+0.00000000e+00) F07=00000000 (+0.00000000e+00)
F08=00000000 (+0.00000000e+00) F09=00000000 (+0.00000000e+00)
F10=00000000 (+0.00000000e+00) F11=00000000 (+0.00000000e+00)
F12=00000000 (+0.00000000e+00) F13=00000000 (+0.00000000e+00)
F14=00000000 (+0.00000000e+00) F15=00000000 (+0.00000000e+00)
(qemu) 
lgirdwood commented 5 years ago

@kv2019i AR regs start at offset 0xa8cc (base is 0xa800), header looks OK. A regs appear to be missing, but are duplicated in AR36 - AR51 , except AR38

lgirdwood commented 5 years ago

and here is stack

0007fd90  80 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0007fda0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0007fdc0  46 37 02 be 00 00 00 00  00 00 00 00 00 00 00 00  |F7..............|
0007fdd0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0007fdf0  46 37 02 be 35 07 06 00  0a 00 00 00 00 00 00 00  |F7..5...........|
0007fe00  40 13 05 be 01 00 00 00  54 13 05 be e0 f5 02 be  |@.......T.......|
0007fe10  0f 00 00 00 00 00 00 00  79 70 02 be 7f 70 02 be  |........yp...p..|
0007fe20  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
0007fe50  00 00 00 00 22 07 06 00  00 00 00 00 00 00 00 00  |...."...........|
0007fe60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0007fe70  74 4b 02 7e b0 fe 07 be  04 00 00 00 08 e0 00 be  |tK.~............|
0007fe80  0c 16 00 00 48 c9 02 be  00 00 40 00 04 16 00 00  |....H.....@.....|
0007fe90  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
0007fea0  33 cb 01 be 10 ff 07 be  00 00 00 00 20 ca 02 be  |3........... ...|
0007feb0  00 00 00 00 00 00 00 00  0a 00 00 00 00 00 00 00  |................|
0007fec0  06 00 02 16 00 00 00 00  48 c9 02 be 1c f0 05 be  |........H.......|
0007fed0  00 00 00 00 00 00 00 00  79 70 02 be 7f 70 02 be  |........yp...p..|
0007fee0  74 4b 02 be f0 fe 07 be  00 00 40 00 00 00 00 00  |tK........@.....|
0007fef0  08 00 00 00 d8 01 00 00  ff ff ff ff 20 00 06 00  |............ ...|
0007ff00  7d 39 02 be 40 ff 07 be  04 b7 02 be 00 00 00 00  |}9..@...........|
0007ff10  5c c9 02 be 00 00 00 00  00 00 00 00 00 00 00 00  |\...............|
0007ff20  ff ff ff 00 05 d4 64 00  00 00 00 00 28 42 01 20  |......d.....(B. |
0007ff30  a9 47 02 be 60 ff 07 be  08 e0 00 be 40 13 05 be  |.G..`.......@...|
0007ff40  06 d0 00 00 08 e0 00 be  40 f6 02 be 20 00 06 00  |........@... ...|
0007ff50  0a 08 01 be 80 ff 07 be  31 a0 00 00 08 e0 00 be  |........1.......|
0007ff60  08 c0 00 00 00 00 00 00  00 12 05 be 20 00 06 00  |............ ...|
0007ff70  63 05 02 be a0 ff 07 be  cc f5 02 be 08 e0 00 be  |c...............|
0007ff80  ac d3 02 be 40 38 02 be  c0 b9 02 be 05 00 00 00  |....@8..........|
0007ff90  b6 05 02 be c0 ff 07 be  cc f5 02 be 08 e0 00 be  |................|
0007ffa0  00 35 00 00 03 80 00 be  02 00 00 00 00 00 00 00  |.5..............|
0007ffb0  5e 11 01 7e e0 ff 07 be  00 00 00 00 40 bd 02 be  |^..~........@...|
0007ffc0  08 00 00 00 03 e0 00 be  00 00 00 00 00 00 00 00  |................|
0007ffd0  00 00 00 00 00 00 08 be  f2 ff 42 ff 20 00 04 00  |..........B. ...|
0007ffe0  44 bd 02 be 00 00 00 00  00 00 00 00 00 00 00 00  |D...............|
0007fff0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
kv2019i commented 5 years ago

Update: problem is definitely in how the dump routine handles WINDOWBASE updates. If ROTW is called even once (like happens with 1 iteration of store_register_loop), the result is an invalid dump.

@lgirdwood Got basic gdb working at least do a degree within QEMU and it seems ROTW causes another exception and we end up in DoubleExceptionVector handler. But that's probably just a symptom, the same code works when compiled on XT-CC. If I skip the register dump loop that uses ROTW, and dump register state, it looks suspicious (WINDOWBASE and WINDOWSTART), compared to succesful DSP oops dump when compiling with XT-CC.

# CPU registers:

# exccause    00000000 # excvaddr    00000000 # ps          00060725 
# epc1        ff2d2f9d # epc2        00000000 # epc3        ff2d1d8c # epc4        ff2d0c9f 
# epc5        ff2d0c9f # epc6        00000000 # epc7        00000000 
# eps2        00000000 # eps3        00060720 # eps4        00060520 # eps5        00060520 
# eps6        00000000 # eps7        00000000 
# depc        00000000 # intenable   00000000 # interrupt   00000000 # sar         00000000 
# debugcause  00000000 
# windowbase  00000003 # windowstart 0000008a 
# excsave1    00000000 
# ar12        bf2d0c8c # ar13        ff327c40 # ar14        ff34481c # ar15        00060725 
# ar16        00000000 # ar17        ff344cd4 # ar18        00000000 # ar19        00000000 
# ar20        bf2d1b51 # ar21        ff327c10 # ar22        ff344c00 # ar23        00000000 
# ar24        00db5246 # ar25        00000000 # ar26        00000000 # ar27        00000002 
# ar28        00000000 # ar29        00000000 # ar30        00000000 # ar31        00000000 
# ar32        00000000 # ar33        00000000 # ar34        00000000 # ar35        00000000 
# ar36        00000000 # ar37        00000000 # ar38        00000000 # ar39        00000000 
# ar40        00000000 # ar41        00000000 # ar42        00000000 # ar43        00000000 
# ar44        00000000 # ar45        00000000 # ar46        00000000 # ar47        00000000
# ar48        00000000 # ar49        00000000 # ar50        00000000 # ar51        00000000
# ar52        00000000 # ar53        00000000 # ar54        00000000 # ar55        00000000
# ar56        00000000 # ar57        00000000 # ar58        00000000 # ar59        00000000
# ar60        00000000 # ar61        00000000 # ar62        00000000 # ar63        00000000
# ar0         00000000 # ar1         00000000 # ar2         00000000 # ar3         00000000
# ar4         00000000 # ar5         00000000 # ar6         00000000 # ar7         00000000
# ar8         00000000 # ar9         00000000 # ar10        00000000 # ar11        00000000

# windowbase: 3
#               2   0 1
# windowstart: b10001010

#      reg         a0         a1
#                  (return)   (sptr)
#      ---         --------   -------
#  0 # ar12        bf2d0c8c   ff327c40
#  1 # ar4         00000000   0
#  2 # ar28        00000000   0
kv2019i commented 5 years ago

I now got an OK exception dump (for another bug) on WHL (cnl image), built with GCC, so at least this is not happening in all cases. Rootcause still unknown.

lgirdwood commented 5 years ago

@kv2019i interesting, this implies differences to window registers may impact the output. I will be pushing a qemu update shortly that will let you trace execution including functions and registers. This should show how window registers are impacting the output.

jajanusz commented 5 years ago

@kv2019i I don't know from where you get oops defs, but these in kernel were wrong. Can you still reproduce the issue after recent ABI align / coredumper updates?

kv2019i commented 5 years ago

@jajanusz wrote:

@kv2019i I don't know from where you get oops defs, but these in kernel were wrong. Can you still reproduce the issue after recent ABI align / coredumper updates?

Sorry for late reply. I did not rely on kernel defintions. Rather I dump the whole exception window area and just use the coredump script to analyze. This way kernel definitions are not in play (they are just used to print out some summary data to kernel dmesg).

When I trigger the oops myself (null reference or asm("ill")), I still get the corrupt oops as decoded by coredumper. Same code compiled with xt-cc works ok.

OTOH, I have seen ok looking oops in other bugs, so this is probably way how the window registers get managed with nested exceptions.

tlauda commented 4 years ago

@kv2019i Still reproduced?

kv2019i commented 4 years ago

@tlauda I'm stuck in a continuous stream of P1 bugs, so I haven't had time to test this in recent times. As nobody else seems to have hit this, let's downgrade to P3 and upon next triage round, let's close if no new findings (and I've not reproduce by that time).

tlauda commented 4 years ago

@kv2019i OK, thanks for the update.

marc-hb commented 3 years ago

After banging my head really hard and pulling a lot of hair I had the funny idea of hitting https://www.google.com/search?q=arch_dump_regs_a There is a single hit on the entire Internet: this page.

To avoid someone getting hurt again I submitted panic comments and a workaround suggestion in PR #4401

plbossart commented 3 years ago

After banging my head really hard and pulling a lot of hair I had the funny idea of hitting https://www.google.com/search?q=arch_dump_regs_a There is a single hit on the entire Internet: this page.

To avoid someone getting hurt again I submitted panic comments and a workaround suggestion in PR #4401

https://en.wikipedia.org/wiki/Googlewhack !