raburton / rboot

An open source bootloader for the ESP8266
https://richard.burtons.org/tag/rboot/?order=ASC
MIT License
299 stars 72 forks source link

Random hang during power on #39

Open ChenHsiang opened 5 years ago

ChenHsiang commented 5 years ago

In my device environment, there are chances that ESP8266's bootloader (rboot) locked in a state that it is not loading any rom, and cannot be recovered by reset. To recover it from the error state, I need to remove the power souce and do a power cycle again.

The root cause is that during the boot process, rboot looked at the RTC memory area, since the application code is not there yet, the content in the RTC area is uninitialized. Rboot check the chksum value, idelly it should most likely failed due to random data content in memory, but the reality is that the chksum value sometimes return OK and rboot go ahead read the garbage in the rtc area, and read the wrong rom info. This is not recovable because app code is never loaded to update the RTC value, and without power cycle the RTC value is never changed, rboot is stuck in the cycle reading the same error content.

To fix it, I propose rboot should flush out the RTC memory area when detecting this situation, and jumpt to the normal standard boot mode.

> diff --git a/rboot.c b/rboot.c
> index 0e95db5..e4a3ce7 100644
> --- a/rboot.c
> +++ b/rboot.c
> @@ -386,11 +386,16 @@ uint32 NOINLINE find_image(void) {
>   // if rtc data enabled, check for valid data
>   if (system_rtc_mem(RBOOT_RTC_ADDR, &rtc, sizeof(rboot_rtc_data), RBOOT_RTC_READ) &&
>       (rtc.chksum == calc_chksum((uint8*)&rtc, (uint8*)&rtc.chksum))) {
>                 if (rtc.next_mode & MODE_TEMP_ROM) {
>                         if (rtc.temp_rom >= romconf->count) {
>                                 ets_printf("Invalid temp rom selected.\r\n");
> -                               return 0;
> -                       }
> -                       ets_printf("Booting temp rom.\r\n");
> -                       temp_boot = TRUE;
> -                       romToBoot = rtc.temp_rom;
> +                               // make sure rtc temp boot mode doesn't persist
> +                               rtc.next_mode = MODE_STANDARD;
> +                               rtc.chksum = calc_chksum((uint8*)&rtc, (uint8*)&rtc.chksum);
> +                               system_rtc_mem(RBOOT_RTC_ADDR, &rtc, sizeof(rboot_rtc_data), RBOOT_RTC_WRITE);
> +                               ets_printf("Return to MODE_STANDARD\r\n");
> +                       } else {
> +                           ets_printf("Booting temp rom.\r\n");
> +                           temp_boot = TRUE;
> +                           romToBoot = rtc.temp_rom;
> +                        }
>                 }
>         }
> 
> 
raburton commented 5 years ago

Sorry about the slow reply. I think it would be a lot simpler to check the magic value at the start of the RTC area, as well as the checksum. The checksum has a small chance of being correct by accident, only being one byte. But the odds of getting the checksum and 4 byte magic by chance would be very slim.