polarfire-soc / hart-software-services

PolarFire SoC hart software services
Other
36 stars 45 forks source link

HSS struggling to reboot the system on watchdog timeout #52

Closed nearly-big-endian closed 1 year ago

nearly-big-endian commented 1 year ago

Hi,

We are relying on the watchdog mechanism to have the board rebooted in case of system freeze (Linux in our case).

That feature used to work as expected on the HSS 0.99.26 (together with Reference design 2021.11 and Polarfire Yocto BSP 2021.11) on an IcicleKit, in the sense that the board did get successfully restarted on watchdog timeout (in case it was not refreshed).

We however noticed since upgrading to HSS 0.99.33 // Ref Design 2022.09 // Yocto BSP 2022.09 that HSS fails to restart the board on first watchdog timeout signal.

It eventually manages to restart the board, but only after having received 9 more watchdog timeout signals (as the log shows below). If the watchdog runtime duration is set to 30 seconds and the watchdog is not actively refreshed, the board therefore ends up being effectively rebooted only after about 300 seconds instead of 30 seconds (that is, at the end of 10 watchdog timeout signals).

Could it be a regression in the HSS ?

Here is a set of Linux commands that may be used to reproduce the issue, using the devmem2 linux command line tool. This sets up the U54 watchdog 1 with the maximum runtime duration (about 30 seconds) and let it expire. The expected outcome is that the board gets rebooted immediately on watchdog timeout.

# -- Setup and start watchdog 1 --
# Set WATCHDOG1.MVRP to about 75% of max duration
devmem2 0x20101010 w 0x00BFFFF4
# Set WATCHDOG1.TRIGGER to its max value
devmem2 0x20101014 w 0x00000FFF
# Set WATCHDOG1.RUNTIME to max duration (about 30 seconds)
devmem2 0x2010100C w 0x00FFFFF0
# Set WATCHDOG1.CTRL to (4:forbidden_enabled + 1:wdog_irq_enabled + 0:mvrp_irq_enabled)
devmem2 0x20101004 w 0x13
# Write into WATCHDOG1.REFRESH to start watchdog 1
devmem2 0x20101000 w 0xDEADC0DE

# -- Check watchdog 1 state --
# Check WATCHDOG1 counter value
devmem2 0x20101000 w
# Check WATCHDOG1 status
devmem2 0x20101008 w

Here is below the HSS log, taken from the first watchdog timeout trigger event. Notice the lines 'Watchdog has triggered - 10' appearing 10 times, roughly every 30 seconds. The system finally gets restarted at the [419.13129] time mark, after 9 failed attempts.

Thanks for your inputs.

[160.253111] Watchdog has triggered - 10
[160.258718] Boot image passed CRC
[160.263683] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[160.271009] Boot image passed CRC
[160.275974] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[160.283300] Boot image passed CRC
[160.288265] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[160.295590] Boot image passed CRC
[160.300555] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[160.307812] boot_service(u54_1) :: [Init] -> [SetupPMP]
[160.314877] boot_service(u54_2) :: [Init] -> [SetupPMP]
[160.321943] boot_service(u54_3) :: [Init] -> [SetupPMP]
[160.329008] boot_service(u54_4) :: [Init] -> [SetupPMP]
[160.336074] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[160.345335] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[160.353547] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[160.361758] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[160.369969] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[160.378181] u54 State Change:  [Booting] [Booting] [Booting] [Booting]
[160.387729] boot_service(u54_1) :: [SetupPMPComplete] -> [ZeroInit]
[160.395940] boot_service(u54_2) :: [SetupPMPComplete] -> [ZeroInit]
[160.404151] boot_service(u54_3) :: [SetupPMPComplete] -> [ZeroInit]
[160.412362] boot_service(u54_4) :: [SetupPMPComplete] -> [ZeroInit]
[160.420574] boot_service(u54_1) :: [ZeroInit] -> [Download]
[160.428021] boot_service(u54_2) :: [ZeroInit] -> [Download]
[160.435469] boot_service(u54_3) :: [ZeroInit] -> [Download]
[160.442916] boot_service(u54_4) :: [ZeroInit] -> [Download]
[160.450363] boot_service(u54_1)::Processing boot image: "u-boot.bin"
[160.458384] boot_service(u54_2) :: [Download] -> [Idle]
[160.465449] boot_service(u54_3) :: [Download] -> [Idle]
[160.472515] boot_service(u54_4) :: [Download] -> [Idle]
[160.535506] boot_service(u54_1) :: [Download] -> [OpenSBIInit]
[160.543240] boot_service(u54_1)::u54_2:sbi_init 0x1000200000
[160.550496] u54 State Change:  [Booting] [SBIWaitForColdboot] [Booting] [Booting]
[160.561094] boot_service(u54_1)::u54_3:sbi_init 0x1000200000
[160.568351] u54 State Change:  [Booting] [SBIWaitForColdboot] [SBIWaitForColdboot] [Booting]
[160.579999] boot_service(u54_1)::u54_4:sbi_init 0x1000200000
[160.587256] u54 State Change:  [Booting] [SBIWaitForColdboot] [SBIWaitForColdboot] [SBIWaitForColdboot]
[160.599955] boot_service(u54_1) :: [OpenSBIInit] -> [Wait]
[160.607307] boot_service(u54_1)::u54_1:sbi_init 0x1000200000
[160.614563] boot_service(u54_1)::Checking for IPI ACKs: - -
[160.621724] boot_service(u54_1)::Checking for IPI ACKs: ACK/IDLE ACK
[160.629744] boot_service(u54_1) :: [Wait] -> [Idle]
[160.636428] u54 State Change:  [Fatal] [SBIWaitForColdboot] [SBIWaitForColdboot] [SBIWaitForColdboot]
[171.471403] loop 15000000 took 4736 ticks (max 64391704 ticks)

[188.886069] Watchdog has triggered - 10
[188.891677] Boot image passed CRC
[188.896642] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[188.903968] Boot image passed CRC
[188.908933] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[188.916259] Boot image passed CRC
[188.921224] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[188.928549] Boot image passed CRC
[188.933514] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[188.940771] boot_service(u54_1) :: [Init] -> [SetupPMP]
[188.947836] boot_service(u54_2) :: [Init] -> [SetupPMP]
[188.954902] boot_service(u54_3) :: [Init] -> [SetupPMP]
[188.961967] boot_service(u54_4) :: [Init] -> [SetupPMP]
[188.969033] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[188.978295] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[188.986506] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[188.994717] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[189.02928] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[189.941063] boot_service(u54_1)::Timeout after 81321 iterations
[189.948892] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[189.956817] boot_service(u54_2)::Timeout after 81321 iterations
[189.964646] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[189.972571] boot_service(u54_3)::Timeout after 81321 iterations
[189.980401] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[189.988325] boot_service(u54_4)::Timeout after 81321 iterations
[189.996155] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[190.04080] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[190.28713] boot_service(u54_1) :: [Error] -> [Idle]
[190.35397] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[190.60031] boot_service(u54_2) :: [Error] -> [Idle]
[190.66714] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[190.91348] boot_service(u54_3) :: [Error] -> [Idle]
[190.98032] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[190.122666] boot_service(u54_4) :: [Error] -> [Idle]
[190.129445] loop 16711684 took 75219162 ticks (max 75219162 ticks)
[198.543377] loop 17500000 took 4755 ticks (max 75219162 ticks)

[217.519159] Watchdog has triggered - 10
[217.524766] Boot image passed CRC
[217.529731] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[217.537057] Boot image passed CRC
[217.542022] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[217.549348] Boot image passed CRC
[217.554313] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[217.561639] Boot image passed CRC
[217.566604] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[217.573860] boot_service(u54_1) :: [Init] -> [SetupPMP]
[217.580926] boot_service(u54_2) :: [Init] -> [SetupPMP]
[217.587991] boot_service(u54_3) :: [Init] -> [SetupPMP]
[217.595057] boot_service(u54_4) :: [Init] -> [SetupPMP]
[217.602122] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[217.611384] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[217.619595] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[217.627806] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[217.636018] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[218.573770] boot_service(u54_1)::Timeout after 81100 iterations
[218.581599] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[218.589524] boot_service(u54_2)::Timeout after 81100 iterations
[218.597354] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[218.605278] boot_service(u54_3)::Timeout after 81100 iterations
[218.613108] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[218.621033] boot_service(u54_4)::Timeout after 81100 iterations
[218.628862] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[218.636787] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[218.661516] boot_service(u54_1) :: [Error] -> [Idle]
[218.668295] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[218.693025] boot_service(u54_2) :: [Error] -> [Idle]
[218.699804] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[218.724533] boot_service(u54_3) :: [Error] -> [Idle]
[218.731312] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[218.756041] boot_service(u54_4) :: [Error] -> [Idle]
[218.762820] loop 19363893 took 75620163 ticks (max 75620163 ticks)
[225.525938] loop 20000000 took 4735 ticks (max 75620163 ticks)

[246.152248] Watchdog has triggered - 10
[246.157855] Boot image passed CRC
[246.162820] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[246.170146] Boot image passed CRC
[246.175111] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[246.182437] Boot image passed CRC
[246.187402] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[246.194728] Boot image passed CRC
[246.199693] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[246.206949] boot_service(u54_1) :: [Init] -> [SetupPMP]
[246.214015] boot_service(u54_2) :: [Init] -> [SetupPMP]
[246.221080] boot_service(u54_3) :: [Init] -> [SetupPMP]
[246.228146] boot_service(u54_4) :: [Init] -> [SetupPMP]
[246.235211] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[246.244473] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[246.252684] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[246.260896] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[246.269107] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[247.206859] boot_service(u54_1)::Timeout after 81441 iterations
[247.214689] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[247.222614] boot_service(u54_2)::Timeout after 81441 iterations
[247.230443] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[247.238368] boot_service(u54_3)::Timeout after 81441 iterations
[247.246197] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[247.254122] boot_service(u54_4)::Timeout after 81441 iterations
[247.261951] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[247.269876] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[247.294605] boot_service(u54_1) :: [Error] -> [Idle]
[247.301385] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[247.326114] boot_service(u54_2) :: [Error] -> [Idle]
[247.332893] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[247.357622] boot_service(u54_3) :: [Error] -> [Idle]
[247.364401] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[247.389131] boot_service(u54_4) :: [Error] -> [Idle]
[252.474213] loop 22500000 took 4741 ticks (max 75620163 ticks)

[274.785337] Watchdog has triggered - 10
[274.790944] Boot image passed CRC
[274.795909] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[274.803235] Boot image passed CRC
[274.808200] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[274.815526] Boot image passed CRC
[274.820491] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[274.827817] Boot image passed CRC
[274.832782] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[274.840039] boot_service(u54_1) :: [Init] -> [SetupPMP]
[274.847104] boot_service(u54_2) :: [Init] -> [SetupPMP]
[274.854170] boot_service(u54_3) :: [Init] -> [SetupPMP]
[274.861235] boot_service(u54_4) :: [Init] -> [SetupPMP]
[274.868301] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[274.877562] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[274.885773] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[274.893985] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[274.902196] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[275.839949] boot_service(u54_1)::Timeout after 81695 iterations
[275.847778] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[275.855703] boot_service(u54_2)::Timeout after 81695 iterations
[275.863532] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[275.871457] boot_service(u54_3)::Timeout after 81695 iterations
[275.879286] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[275.887211] boot_service(u54_4)::Timeout after 81695 iterations
[275.895041] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[275.902965] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[275.927695] boot_service(u54_1) :: [Error] -> [Idle]
[275.934474] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[275.959203] boot_service(u54_2) :: [Error] -> [Idle]
[275.965982] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[275.990711] boot_service(u54_3) :: [Error] -> [Idle]
[275.997491] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[276.22220] boot_service(u54_4) :: [Error] -> [Idle]
[279.579657] loop 25000000 took 4747 ticks (max 75620163 ticks)

[303.418426] Watchdog has triggered - 10
[303.424034] Boot image passed CRC
[303.428999] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[303.436325] Boot image passed CRC
[303.441290] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[303.448616] Boot image passed CRC
[303.453581] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[303.460906] Boot image passed CRC
[303.465871] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[303.473128] boot_service(u54_1) :: [Init] -> [SetupPMP]
[303.480193] boot_service(u54_2) :: [Init] -> [SetupPMP]
[303.487259] boot_service(u54_3) :: [Init] -> [SetupPMP]
[303.494324] boot_service(u54_4) :: [Init] -> [SetupPMP]
[303.501390] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[303.510651] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[303.518863] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[303.527074] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[303.535285] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[304.473047] boot_service(u54_1)::Timeout after 81402 iterations
[304.480876] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[304.488801] boot_service(u54_2)::Timeout after 81402 iterations
[304.496630] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[304.504555] boot_service(u54_3)::Timeout after 81402 iterations
[304.512384] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[304.520309] boot_service(u54_4)::Timeout after 81402 iterations
[304.528138] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[304.536732] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[304.561461] boot_service(u54_1) :: [Error] -> [Idle]
[304.568240] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[304.592969] boot_service(u54_2) :: [Error] -> [Idle]
[304.599748] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[304.624478] boot_service(u54_3) :: [Error] -> [Idle]
[304.631257] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[304.655986] boot_service(u54_4) :: [Error] -> [Idle]
[306.642890] loop 27500000 took 4729 ticks (max 75620163 ticks)

[332.51516] Watchdog has triggered - 10
[332.57027] Boot image passed CRC
[332.61897] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[332.69127] Boot image passed CRC
[332.73997] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[332.81227] Boot image passed CRC
[332.86097] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[332.93327] Boot image passed CRC
[332.98197] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[332.105358] boot_service(u54_1) :: [Init] -> [SetupPMP]
[332.112423] boot_service(u54_2) :: [Init] -> [SetupPMP]
[332.119489] boot_service(u54_3) :: [Init] -> [SetupPMP]
[332.126554] boot_service(u54_4) :: [Init] -> [SetupPMP]
[332.133620] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[332.142881] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[332.151093] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[332.159304] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[332.167515] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[333.105268] boot_service(u54_1)::Timeout after 80404 iterations
[333.113097] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[333.121022] boot_service(u54_2)::Timeout after 80404 iterations
[333.128851] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[333.136776] boot_service(u54_3)::Timeout after 80404 iterations
[333.144606] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[333.152530] boot_service(u54_4)::Timeout after 80404 iterations
[333.160360] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[333.168285] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[333.193014] boot_service(u54_1) :: [Error] -> [Idle]
[333.199793] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[333.224522] boot_service(u54_2) :: [Error] -> [Idle]
[333.231301] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[333.256031] boot_service(u54_3) :: [Error] -> [Idle]
[333.262810] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[333.287539] boot_service(u54_4) :: [Error] -> [Idle]
[333.740948] loop 30000000 took 4767 ticks (max 75620163 ticks)
[360.383209] loop 32500000 took 4767 ticks (max 75620163 ticks)

[360.684605] Watchdog has triggered - 10
[360.690212] Boot image passed CRC
[360.695177] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[360.702503] Boot image passed CRC
[360.707468] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[360.714794] Boot image passed CRC
[360.719759] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[360.727085] Boot image passed CRC
[360.732050] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[360.739306] boot_service(u54_1) :: [Init] -> [SetupPMP]
[360.746372] boot_service(u54_2) :: [Init] -> [SetupPMP]
[360.753437] boot_service(u54_3) :: [Init] -> [SetupPMP]
[360.760503] boot_service(u54_4) :: [Init] -> [SetupPMP]
[360.767568] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[360.776830] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[360.785041] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[360.793253] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[360.801464] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[361.739216] boot_service(u54_1)::Timeout after 81212 iterations
[361.747046] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[361.754971] boot_service(u54_2)::Timeout after 81212 iterations
[361.762800] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[361.770725] boot_service(u54_3)::Timeout after 81212 iterations
[361.778554] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[361.786479] boot_service(u54_4)::Timeout after 81212 iterations
[361.794308] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[361.802233] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[361.826962] boot_service(u54_1) :: [Error] -> [Idle]
[361.833741] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[361.858471] boot_service(u54_2) :: [Error] -> [Idle]
[361.865250] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[361.889979] boot_service(u54_3) :: [Error] -> [Idle]
[361.896758] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[361.921488] boot_service(u54_4) :: [Error] -> [Idle]
[387.345850] loop 35000000 took 4730 ticks (max 75620163 ticks)

[389.317694] Watchdog has triggered - 10
[389.323301] Boot image passed CRC
[389.328266] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[389.335592] Boot image passed CRC
[389.340557] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[389.347883] Boot image passed CRC
[389.352848] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[389.360174] Boot image passed CRC
[389.365139] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[389.372396] boot_service(u54_1) :: [Init] -> [SetupPMP]
[389.379461] boot_service(u54_2) :: [Init] -> [SetupPMP]
[389.386527] boot_service(u54_3) :: [Init] -> [SetupPMP]
[389.393592] boot_service(u54_4) :: [Init] -> [SetupPMP]
[389.400658] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[389.409919] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[389.418130] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[389.426342] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[389.434553] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[390.372306] boot_service(u54_1)::Timeout after 81403 iterations
[390.380135] boot_service(u54_1) :: [SetupPMPComplete] -> [Error]
[390.388060] boot_service(u54_2)::Timeout after 81403 iterations
[390.395889] boot_service(u54_2) :: [SetupPMPComplete] -> [Error]
[390.403814] boot_service(u54_3)::Timeout after 81403 iterations
[390.411643] boot_service(u54_3) :: [SetupPMPComplete] -> [Error]
[390.419568] boot_service(u54_4)::Timeout after 81403 iterations
[390.427398] boot_service(u54_4) :: [SetupPMPComplete] -> [Error]
[390.435322] boot_service(u54_1)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[390.460052] boot_service(u54_1) :: [Error] -> [Idle]
[390.466831] boot_service(u54_2)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[390.491560] boot_service(u54_2) :: [Error] -> [Idle]
[390.498339] boot_service(u54_3)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[390.523069] boot_service(u54_3) :: [Error] -> [Idle]
[390.529848] boot_service(u54_4)::
*******************************************************************
* WARNING: Boot Error - transitioning to IDLE                     *
*******************************************************************
[390.554577] boot_service(u54_4) :: [Error] -> [Idle]
[414.387409] loop 37500000 took 4754 ticks (max 75620163 ticks)

[417.950783] Watchdog has triggered - 10
[417.956391] Boot image passed CRC
[417.961356] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[417.968682] Boot image passed CRC
[417.973647] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[417.980972] Boot image passed CRC
[417.985937] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[417.993263] Boot image passed CRC
[417.998228] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[418.05485] boot_service(u54_1) :: [Init] -> [SetupPMP]
[418.12455] boot_service(u54_2) :: [Init] -> [SetupPMP]
[418.19425] boot_service(u54_3) :: [Init] -> [SetupPMP]
[418.26395] boot_service(u54_4) :: [Init] -> [SetupPMP]
[418.33365] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[418.42531] No space in queue!!!!!!
[418.47591] u54_1: failed to send message, so freeing
[418.54084] No space in queue!!!!!!
[418.59145] u54_2: failed to send message, so freeing
[418.65637] No space in queue!!!!!!
[418.70698] u54_3: failed to send message, so freeing
[418.77190] No space in queue!!!!!!
[418.82251] u54_4: failed to send message, so freeing
[418.88743] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[418.96859] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[418.104975] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[418.113186] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
[419.05395] boot_service(u54_1)::Timeout after 78463 iterations
[419.13129] modules/ssmb/ipi/ssmb_ipi.c:626: IPI_MessageFree() Assertion failed:
        IPI_DATA.ipi_completes[index].used

  ^^^ SYSTEM REBOOTS NOW ^^^

HSS: decompressing from eNVM to L2 Scratch ... Passed
DDR training ... Passed ( 5217 ms)

[6.166229] PolarFire(R) SoC Hart Software Services (HSS) - version 0.99.33-dev-build
MPFS HAL version 2.0.101 / DDR Driver version 0.4.018 / Mi-V IHC version 0.1.1 / BOARD=mpfs-icicle-kit-es
(c) Copyright 2017-2022 Microchip FPGA Embedded Systems Solutions.

incorporating OpenSBI - version 1.0
(c) Copyright 2019-2022 Western Digital Corporation.

[6.201747] Build ID: c1ed11d489121cad7a3dcd5f528cd463e22c402e
[6.209290] Built with the following tools:
 - riscv64-unknown-elf-gcc (xPack GNU RISC-V Embedded GCC (Microsemi SoftConsole build), 64-bit) 8.3.0
 - GNU ld (xPack GNU RISC-V Embedded GCC (Microsemi SoftConsole build), 64-bit) 2.32

[6.233351] NOTICE: Running from L2 Scratchpad

[6.239557] Serial Number:
5995625459b1187379e72b44fa110c2100000000000000000000000000000000000000000000000000000000000000000000
[6.253498] Segment Configuration:
        Cached: SEG0_0: offset 0x0080000000, physical DDR 0x00000000
        Cached: SEG0_1: offset 0x1000000000, physical DDR 0x02000000
    Non-cached: SEG1_2: offset 0x00c0000000, physical DDR 0x78000000
Non-cached WCB: SEG1_4: offset 0x00d0000000, physical DDR 0x78000000
[6.285101] L2 Cache Configuration:
    L2-Scratchpad:  4 ways (512 KiB)
         L2-Cache:  8 ways (1024 KiB)
           L2-LIM:  4 ways (512 KiB)
[6.301047] DDR-Lo size is   32 MiB
[6.305725] DDR-Hi size is 1888 MiB
[6.310404] Please ensure that jumpers J34/J43 are correct for 1.8V MMC voltage...
[6.319952] Attempting to select SDCARD ... Passed
Press a key to enter CLI, ESC to skip
Timeout in 1 second
..
[7.357385] CLI boot interrupt timeout
[7.362350] Initializing Mi-V IHC
[7.366838] Initializing IPI Queues (6056 bytes @ 0xa02cf20)...
[7.374190] Initializing PMPs
[7.378295] Initializing Boot Image ...
[7.383356] Trying to boot via MMC ...
[7.388321] Preparing to copy from MMC to DDR ...
[7.396228] Validated GPT Header ...
[7.426938] Validated GPT Partition Entries ...
[7.433049] Boot Partition found at index 1
[7.438569] Attempting to read image header (1632 bytes) ...
[7.447848] Copying 691440 bytes to 0xa0000000
[7.482533] MMC: Boot Image registered ...
[7.487950] Boot image passed CRC
[7.492724] Boot image set name: "PolarFire-SoC-HSS::U-Boot"
[7.499789] wdog_service monitoring U54_1 U54_2 U54_3 U54_4
[7.506759] ipi_poll_service :: [Init] -> [Monitoring]
[7.513538] boot_service(u54_1) :: [Init] -> [SetupPMP]
[7.520413] boot_service(u54_2) :: [Init] -> [SetupPMP]
[7.527288] boot_service(u54_3) :: [Init] -> [SetupPMP]
[7.534162] boot_service(u54_4) :: [Init] -> [SetupPMP]
[7.541037] usbdmsc_service :: [Init] -> [Idle]
[7.547147] scrub_service :: [init] -> [scrubbing]
[7.553544] beu_service :: [init] -> [monitoring]
[7.559846] u54 State Change:  [Idle] [Idle] [Idle] [Idle]
[7.568057] loop 1 took 31852033 ticks (max 31852033 ticks)
[7.575314] boot_service(u54_1)::Registering domain "u-boot.bin" (hart mask 0x1e)
[7.584385] boot_service(u54_1) :: [SetupPMP] -> [SetupPMPComplete]
[7.592405] boot_service(u54_2) :: [SetupPMP] -> [SetupPMPComplete]
[7.600425] boot_service(u54_3) :: [SetupPMP] -> [SetupPMPComplete]
[7.608445] boot_service(u54_4) :: [SetupPMP] -> [SetupPMPComplete]
>> [7.616804] u54 State Change:  [Booting] [Booting] [Booting] [Booting]
[7.626161] boot_service(u54_1) :: [SetupPMPComplete] -> [ZeroInit]
[7.634182] boot_service(u54_2) :: [SetupPMPComplete] -> [ZeroInit]
[7.642202] boot_service(u54_3) :: [SetupPMPComplete] -> [ZeroInit]
[7.650222] boot_service(u54_4) :: [SetupPMPComplete] -> [ZeroInit]
[7.658243] boot_service(u54_1) :: [ZeroInit] -> [Download]
[7.665499] boot_service(u54_2) :: [ZeroInit] -> [Download]
[7.672756] boot_service(u54_3) :: [ZeroInit] -> [Download]
[7.680012] boot_service(u54_4) :: [ZeroInit] -> [Download]
[7.687269] boot_service(u54_1)::Processing boot image: "u-boot.bin"
[7.695098] boot_service(u54_2) :: [Download] -> [Idle]
[7.701973] boot_service(u54_3) :: [Download] -> [Idle]
[7.708847] boot_service(u54_4) :: [Download] -> [Idle]
[7.764382] boot_service(u54_1) :: [Download] -> [OpenSBIInit]
[7.771925] boot_service(u54_1)::u54_2:sbi_init 0x1000200000
[7.778990] boot_service(u54_1)::u54_3:sbi_init 0x1000200000
[7.786056] boot_service(u54_1)::u54_4:sbi_init 0x1000200000
[7.793121] boot_service(u54_1) :: [OpenSBIInit] -> [Wait]
[7.800282] boot_service(u54_1)::u54_1:sbi_init 0x1000200000
[7.808684] boot_service(u54_1)::Checking for IPI ACKs: - -
[7.863967] boot_service(u54_1)::Checking for IPI ACKs: ACK/IDLE ACK
[7.892516] boot_service(u54_1) :: [Wait] -> [Idle]
[7.910275] u54 State Change:  [SBIHartInit] [Booting] [Booting] [Booting]
[7.944839] loop 2706 took 64506354 ticks (max 64506354 ticks)
[8.45570] u54 State Change:  [Running] [SBIWaitForColdboot] [SBIWaitForColdboot] [SBIWaitForColdboot]
[12.823494] u54 State Change:  [Running] [Running] [Running] [Running]
[37.478383] loop 2500000 took 4732 ticks (max 64506354 ticks)
griffini commented 1 year ago

Hi @nearly-big-endian, thanks for reporting this.

We'll attempt to reproduce this and then diagnose what is going on. I suspect a small race condition in the HSS in how the individual harts are restarting when the watchdog trips.

griffini commented 1 year ago

Hi @nearly-big-endian , we have a patch being reviewed to address this issue for you.

nearly-big-endian commented 1 year ago

Hi @nearly-big-endian , we have a patch being reviewed to address this issue for you.

This is great news, thanks. I am looking forward to trying this fix.

griffini commented 1 year ago

Hi @nearly-big-endian

For now the following workaround patch will restore the behavior you want. A clean fix for this (which also supports AMP correctly) will be added into the next official HSS release, but we didn't want to hold you up in the interim.

diff --git a/services/wdog/wdog_service.c b/services/wdog/wdog_service.c
index cafff21..98327bc 100644
--- a/services/wdog/wdog_service.c
+++ b/services/wdog/wdog_service.c
@@ -153,6 +153,9 @@ static void wdog_monitoring_handler(struct StateMachine * const pMyMachine)
     status &= hartBitmask.uint;

     if (status) {
+#if IS_ENABLED(CONFIG_ALLOW_COLDREBOOT_ALWAYS)
+        HSS_Wdog_Reboot(HSS_HART_ALL);
+#else
         // watchdog timer has triggered for a monitored hart..
         mHSS_DEBUG_PRINTF(LOG_ERROR, "Watchdog has triggered - %02x\n", status);

@@ -176,6 +179,7 @@ static void wdog_monitoring_handler(struct StateMachine * const pMyMachine)
             HSS_Boot_RestartCore(HSS_HART_U54_4);
             wdogInitTime[HSS_HART_U54_4] = HSS_GetTime();
         }
+#endif
 #endif
     }
 }
nearly-big-endian commented 1 year ago

Hi,

I am very pleased to report that we have tested this patch and that it fixes the issue we were having perfectly.

Thanks very much for the quick response (and correction).