m-labs / artiq

A leading-edge control system for quantum information experiments
https://m-labs.hk/artiq
GNU Lesser General Public License v3.0
425 stars 196 forks source link

sayma panic IllegalInsn at PC 0x4003c9a8 #1026

Closed jbqubit closed 6 years ago

jbqubit commented 6 years ago

Running 38b51282226f9 built with JESD204B=0.6 with SAWG and HMC830. Running SAWG sines.py example. It runs for several minutes then panics. I've seen this twice. Usually don't see the panic. For some reason hmc7043 hand consistently happens after post-panic restart.

panic at runtime/main.rs:305:14: exception IllegalInsn at PC 0x4003c9a8, EA 0x40153058
backtrace for software version 4.0.dev+1087.g38b51282:
0x40002f58
0x40042760
0x40002cd8
0x400010d0
restarting...
[     0.000007s]  INFO(runtime): ARTIQ runtime starting...
[     0.003886s]  INFO(runtime): software version 4.0.dev+1087.g38b51282
[     0.010237s]  INFO(runtime): gateware version 4.0.dev+1087.g38b51282
[     0.016616s]  INFO(runtime): log level set to INFO by default
[     0.022321s]  INFO(runtime): UART log level set to INFO by default
[     0.028459s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...
[     0.062137s]  INFO(board_artiq::serwb): done.
[     0.065173s]  INFO(board_artiq::serwb): RTM to AMC Link test
[     1.547216s]  INFO(board_artiq::serwb): 0 errors
[     1.550523s]  INFO(board_artiq::serwb): AMC to RTM Link test
[     3.032572s]  INFO(board_artiq::serwb): 0 errors
[     3.035876s]  INFO(board_artiq::serwb): Wishbone test...
[     4.967898s]  INFO(board_artiq::serwb): 0 errors
[     4.971210s]  INFO(board_artiq::serwb): AMC serwb settings:
[     4.976758s]  INFO(board_artiq::serwb):   bitslip: 13
[     4.981794s]  INFO(board_artiq::serwb):   ready: 1
[     4.986570s]  INFO(board_artiq::serwb):   error: 0
[     4.991345s]  INFO(board_artiq::serwb): RTM serwb settings:
[     4.996911s]  INFO(board_artiq::serwb):   bitslip: 30
[     5.001947s]  INFO(board_artiq::serwb):   ready: 1
[     5.006724s]  INFO(board_artiq::serwb):   error: 0
[     5.011733s]  INFO(board_artiq::serwb): RTM gateware version 4.0.dev+1087.g38b51282
[     5.019137s]  INFO(runtime): press 'e' to erase startup and idle kernels...
[     6.019005s]  INFO(runtime): continuing boot
[     6.022066s]  INFO(board_artiq::hmc830_7043::hmc830): HMC830 found
[     6.028307s]  INFO(board_artiq::hmc830_7043::hmc7043): HMC7043 found
[     6.034473s]  INFO(board_artiq::hmc830_7043::hmc7043): shutting down
[     6.040829s]  INFO(board_artiq::hmc830_7043::hmc830): loading configuration...
[     6.048320s]  INFO(board_artiq::hmc830_7043::hmc830):   ...done
[     6.053908s]  INFO(board_artiq::hmc830_7043::hmc830): waiting for lock...
[     6.060713s]  INFO(board_artiq::hmc830_7043::hmc830):   ...locked
[     6.066775s]  INFO(board_artiq::hmc830_7043::hmc7043): loading configuration...
sbourdeauducq commented 6 years ago

When that happens, can you post the exact runtime.elf file that was flashed into the board, and the corresponding exact error message? @whitequark Can we get memory dumps around the illegal instruction PC?

hartytp commented 6 years ago

@whitequark do you need anything that's not here? One of those pastes is the complete UART output for an illegal instruction error, and I dumped the entire build as well.

jbqubit commented 6 years ago

Just saw this again. Was running sines.py on 38b51282226f9 with SAWG, JESD204b=0.6 and HMC830. runtime.elf.zip

[    11.219450s]  INFO(runtime::session): startup kernel finished
[    11.224668s]  INFO(runtime::session): no connection, starting idle kernel
[    11.231555s]  INFO(runtime::session): no idle kernel found
[    27.479679s]  INFO(runtime::session): new connection from 192.168.1.68:54136
[    27.522905s]  INFO(runtime::kern_hwreq): resetting RTIO
panic at runtime/main.rs:305:14: exception IllegalInsn at PC 0x4003c9a8, EA 0x40153058
backtrace for software version 4.0.dev+1087.g38b51282:
0x40002f58
0x40042760
0x40002cd8
0x400010d0
restarting...
[     0.000007s]  INFO(runtime): ARTIQ runtime starting...
[     0.003892s]  INFO(runtime): software version 4.0.dev+1087.g38b51282
[     0.010242s]  INFO(runtime): gateware version 4.0.dev+1087.g38b51282
[     0.016622s]  INFO(runtime): log level set to INFO by default
[     0.022327s]  INFO(runtime): UART log level set to INFO by default
[     0.028465s]  INFO(board_artiq::serwb): waiting for AMC/RTM serwb bridge to be ready...

Not sure why upon restart it hangs waiting for RTM FPGA. AFAIR that chip's .bit doesn't get erased upon restart.

jbqubit commented 6 years ago

Just happened again. Was running sines.py and seeing sinusoidal output on scope. After about 2 minutes see panic and output on scope is garbage. Same .elf.

panic at runtime/main.rs:305:14: exception IllegalInsn at PC 0x4003c9c0, EA 0x40152bbc
backtrace for software version 4.0.dev+1087.g38b51282:
0x40002f58
0x40042760
0x40002cd8
0x400010d0
restarting...
hartytp commented 6 years ago

hmmm...that's after the HMC7043/HMC830 are correctly configured, so it's unlikely that this is due to those chips.

@gkasprow can we add a PLL locked LED to the FP of Sayma for the next revision?

sbourdeauducq commented 6 years ago

After about 2 minutes see panic and output on scope is garbage.

Please disable restart-on-panic so that we know if the garbage signal is due to the crash or the restart.

whitequark commented 6 years ago

@whitequark Can we get memory dumps around the illegal instruction PC?

I'll add this.

jbqubit commented 6 years ago

And again... I continue posting as the hex codes are changing.

panic at runtime/main.rs:305:14: exception IllegalInsn at PC 0x4003c9a8, EA 0x40153058
backtrace for software version 4.0.dev+1087.g38b51282:
0x40002f58
0x40042760
0x40002cd8
0x400010d0
restarting...

Roger, I'll disable restart-on-panic.

sbourdeauducq commented 6 years ago

Same runtime.elf for all those dumps?

jbqubit commented 6 years ago

Same runtime.elf.

jbqubit commented 6 years ago
panic at runtime/main.rs:305:14: exception IllegalInsn at PC 0x4003c9a8, EA 0x40153058
backtrace for software version 4.0.dev+1087.g38b51282:
0x40002f58
0x40042760
0x40002cd8
0x400010d0
halting.
use `artiq_coreconfig write -s panic_reset 1` to restart instead
whitequark commented 6 years ago

Done. On test crash:

@ 0x40002af4
+0000: 1c000000 0000001c 1860400a a8830328
+0010: 9dc2ffe0 a86e0000 04000377 18a00002
+0020: 0400632a a86e0000 19600000 85c2fff4
+0030: 9c220000 8521fffc 44004800 8441fff8
@ 0x40154fd8
+0000: 40154ffc 00000000 400a00c4 00020000
+0010: 00001126 00000003 00000010 00000000
+0020: 4000009c d92f2400 deaddead 002aaff4
+0030: 00000000 00c0a801 3200f903 5f67ca86
panic at runtime/main.rs:323:13: exception IllegalInsn at PC 0x40002af4, EA 0x40154fd8
backtrace for software version 4.0.dev+1105.g985fd737:
0x400032b0
0x4001073c
0x40002cf8
0x400010d0
0x40002aec
restarting...
hartytp commented 6 years ago

@whitequark thanks for adding that. Do you want me to post a new UART trace with the memory dump?

sbourdeauducq commented 6 years ago

Yes, we need the memory dump, the rest of the crash message, and the corresponding runtime.elf.

jbqubit commented 6 years ago

Using latest from master 20180604 with SAWG vivado 2018.1 07d4145a35c739. Meets timing. I've run 25 scripts involving SAWG via Ethernet. No panics.

sbourdeauducq commented 6 years ago

So you fixed Ethernet?

jbqubit commented 6 years ago

Yes. https://github.com/sinara-hw/sinara/issues/553#issuecomment-394362405

hartytp commented 6 years ago

I think we can close this now.

jbqubit commented 6 years ago

Sounds good. I've not seen it repeat.