mmakaay / esphome-xiaomi_bslamp2

ESPHome integration for the Xiaomi Mijia Bedside Lamp v2.
Other
204 stars 49 forks source link

[BUG] Lamp not booting up after flashing with latest ESPHome #104

Closed hellcry37 closed 1 year ago

hellcry37 commented 1 year ago

Describe the bug After flashing latest esphome 2022.12 lamp fails to boot or not connectable. Lamp is not accessible in ha anymore, ip can not be reached.

validation get's me this: WARNING GPIO12 is a Strapping PIN and should be avoided. Attaching external pullup/down resistors to strapping pins can cause unexpected failures. See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins WARNING GPIO4 is a Strapping PIN and should be avoided. Attaching external pullup/down resistors to strapping pins can cause unexpected failures. See https://esphome.io/guides/faq.html#why-am-i-getting-a-warning-about-strapping-pins

To Reproduce Steps to reproduce the behavior:

  1. updated from 2022.11.5 to 2022.12.0

Expected behavior Lamp would work normal

Additional context none

Please investigate if it's a problem with new esphome.

randybb commented 1 year ago

Ouu. Maybe the same issue (not related to this package) that I had with my lamps when updated to 2023.01dev. Could you please connect via uart to it and pull logs? If it is crashing during wifi init, then check this thread https://discord.com/channels/429907082951524364/1050868163467812914 - there is a simple solution which doesn't make sense but it works :D

hellcry37 commented 1 year ago

I am not at home, i'd have to open up the lamp again, will try tonight

hellcry37 commented 1 year ago

I've read the discord did not see any simple solution, care to elaborate on this solution?

randybb commented 1 year ago

If it is the same problem then you just need to comment 4 lines in your yaml, compile&flash, then uncomment them, compile&flash and it will work again.

vta-github commented 1 year ago

@randybb -> may you please specify the 4 lines?

randybb commented 1 year ago

If it is related to the same issue - my log is here, then you need to comment these 4 lines, install it, then uncomment them and install again. obrázok

hellcry37 commented 1 year ago

yah i do not have that in the conf, prob is pulled from this repo, and I try to re flash the device but is dead, i'll buy another

randybb commented 1 year ago

it is in https://github.com/mmakaay/esphome-xiaomi_bslamp2/blob/dev/packages/core.yaml

cracrama commented 1 year ago

Same here - lamps completelly bricked. So to pull that trick up with "uncommenting lines" we need to open device and connect via serial?

cracrama commented 1 year ago

Hello is there any solution for this?

mmakaay commented 1 year ago

I upgraded three lamps myself and they (unfortunately) kept working.

@hellcry37 The device should not be completely dead, no need to buy another. The great thing with the ESP32 is that really bricking it would be quite a feat. One can always flash clean firmware on it to get it going.

The logging that you showed about the strapping pins can be fully ignored. When building your own projects from scratch, these warnings are good, since they make you aware that you might be using pins that could result in unexpected behaviour. For this lamp however, the hardware is as-is and the designers chose to use those pins on purpose. And the lamp works with those pins. So, thank you ESPHome for being friendly by warning us, but we'll ignore those messages.

Side note: One thing that I have in mind for ESPHome, is to implement an option for the pin definition, that can be used to suppress these warnings, so people don't get thrown off by it when compiling the firmware.

For the trick as described by @randybb, you'd indeed need to open the lamp and flash it via serial. He flashed once with a firmware that didn't have those four lines (which basically makes it broken firmware for the type of hardware), and then once again with the four lines enabled again. After that, the lamp started working for him.

As @randybb already stated, it would be interesting to get a dump of the logging that you see on the serial output when booting up the lamp. Mainly to check if it fails in the same spot as for him.

I will try to break my development lamp by downgrading and upgrading it a few times. See if I can hit the issue myself, so I can try to debug this behaviour.

szafran81 commented 1 year ago

I also have currently 3 dead lamps at home. Unfotunatelly it'll be some time untill I'll be able to take care of this problem (have 2 ongoing projects right now - and I wan't to at least finish them partially before I go and dig in into something else).

esphome 2022.12.1 just dropped in. Can anyone with working (or previously dead because the 2022.12.0 update and now fixed) try if something changed with this problem?

mmakaay commented 1 year ago

One thing from the 2022.12 Changelog that could be fishy is:

Along with some of these bluetooth changes is a change to the underlying flash partition table that ESPHome uses. OTA will work, but to fully take advantage of the performance increases for bluetooth, it is best to at least one serial flash with ESPHome 2022.12.0 or later.

That is the only bit that tickles my spider sense.

Up to now, I have only tried upgrading my lamps to the dev bleeding-edge code. I will try to find some time the upcoming days to do some downgrades to various versions, followed by an upgrade to 2022.12.0 specifically, to see if I can find a reproduction path.

mmakaay commented 1 year ago

Hurray! I was able to get my lamp into the "bricked" state as well, by first flashing it using ESPHome 2022.11.0 and then flashing it using ESPHome 2022.12.0. And it indeed was not obvious how to get it back in working order. The good news is: it's not impossible to get it working again 😃

I did many upgrade scenarios, and found that the problem occurs when upgrading from 2022.11.0 to a version of ESPHome after commit #3565: "Update ESP-IDF and platform version". For example ESPHome version 2022.12.0 includes this change.

Recipe for fixing the bricked state

The following steps helped me fix my lamp status.

Step 1: Update the config

Add the following block of YAML code to your device YAML file. I don't think the order would matter, but I put it after the packages: section.

esp32:
  framework:
    sdkconfig_options:
      CONFIG_FREERTOS_UNICORE: n

Note: When you already have an esp32: section in your configuration already, then apply above setting values to it instead (keeping whatever other settings you have in there).

This change makes the produced firmware fully incompatible with the lamp hardware, because it builds a multi-core firmware for the single-core lamp. It seems however that this is the easiest way to get it ready for the next step. Thanks to @randybb for finding this very peculiar fixing step 👍

Step 2: Compile and flash the firmware via serial

Connect the lamp to the serial port of a computer, bring it into flashing mode by plugging in the power while connecting GPIO0 to GND, and flash the new firmware onto it. After completing the flashing operation, disconnect and reconnect the lamp power.

After this, the lamp will not work, but if you look at the serial logging output, you will see something different than the boot loop from before. It will now likely complain with "Running on single core variant of a chip", but I have also seen another pattern without the single core error. Both were fine for the next step.

Step 3: Bring back the configuration to the old state

Either remove the code that you added, or change your esp: config section to use the settings:

esp32:
  framework:
    sdkconfig_options:
      CONFIG_FREERTOS_UNICORE: y

Step 4: Compile and flash the firmware via serial

Again flash the firmware to the device using serial and unplug and replug the power afterwards. This should bring your device back into working order.

mmakaay commented 1 year ago

Hot fix implemented in component version 2021.10.0

To prevent others from running into the same issue, I hot-fixed the core.yaml configuration package in the latest version of my repo. I updated the esp32: section to force the old framework version. That effectively prevents further accidents. I'm also preparing a new version 2022.12.0, with the same fix applied to it, to communicate that for ESPHome 2022.12.0 a new version of my firmware code is required.

Some details about the crash that occured after upgrading

When the firmware is broken, this is the backtrace that the system crashes on:

[23:20:22][V][esp-idf:000]: I (992) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
[23:20:22]
[23:20:22][V][esp-idf:000]: W (993) phy_init: failed to load RF calibration data (0x1102), falling ba
[23:20:22]abort() was called at PC 0x400f3e53 on core 0
[23:20:22]
[23:20:22]
[23:20:22]Backtrace:0x400823ee:0x3ffc33a00x400887b1:0x3ffc33c0 0x4008e95e:0x3ffc33e0 0x400f3e53:0x3ffc3450 0x40148d38:0x3ffc3490 0x40148dfd:0x3ffc34c0
 0x40127eae:0x3ffc34e0 0x40128621:0x3ffc3500 0x40126f08:0x3ffc3520 0x400902d9:0x3ffc3540
WARNING Found stack trace! Trying to decode it
WARNING Decoded 0x400823ee: panic_abort at /Users/mauricem/.platformio/packages/framework-espidf/components/esp_system/panic.c:402
WARNING Decoded 0x400887b1: esp_system_abort at /Users/mauricem/.platformio/packages/framework-espidf/components/esp_system/esp_system.c:128
WARNING Decoded 0x4008e95e: abort at /Users/mauricem/.platformio/packages/framework-espidf/components/newlib/abort.c:46
WARNING Decoded 0x400f3e53: esp_efuse_mac_get_default at /Users/mauricem/.platformio/packages/framework-espidf/components/esp_hw_support/mac_addr.c:13
6
 (inlined by) esp_efuse_mac_get_default at /Users/mauricem/.platformio/packages/framework-espidf/components/esp_hw_support/mac_addr.c:106
WARNING Decoded 0x40148d38: esp_phy_load_cal_and_init at /Users/mauricem/.platformio/packages/framework-espidf/components/esp_phy/src/phy_init.c:714
WARNING Decoded 0x40148dfd: esp_phy_enable at /Users/mauricem/.platformio/packages/framework-espidf/components/esp_phy/src/phy_init.c:236
WARNING Decoded 0x40127eae: wifi_hw_start
WARNING Decoded 0x40128621: wifi_start_process
WARNING Decoded 0x40126f08: ieee80211_ioctl_process
WARNING Decoded 0x400902d9: ppTask

When following this backtrace, the problem lies in the fact that the new version of the ESP-IDF framework is calling esp_efuse_mac_get_default(). That will fail for this lamp, because the default MAC address as burnt into the device has a wrong checksum burnt alongside it. This invalid checksum causes the panic, that leads to the reboot.

For earlier versions of ESP-IDF, I implemented the feature to ignore invalid MAC address checksums. As a matter of fact, when looking at the failing boot log, you can see that this code is actually being hit:

[23:20:21][C][wifi:037]: Setting up WiFi...
[23:20:21][C][wifi:038]:   Local MAC: 54:48:E6:7D:52:C0
[23:20:21][V][wifi_esp32:120]: Use EFuse MAC without checking CRC: 54:48:E6:7D:52:C0

So here I do read the MAC address myself, ignoring the wrong checksum, and I feed it to the WiFi stack as the MAC address to use for connecting to the network.

In the framework version that was used with 2022.11.0, this step was enough to keep the ESP-IDF framework from looking up the burnt in MAC address and checksum itself. It used the MAC address that I fed to it on beforehand.

Since that doesn't apparently prevent the framework from looking up the MAC address, I'll have to dig into the ESP-IDF framework to see if there's a way to prevent the invalid lookup and following panic.

Oh boy :-)

mmakaay commented 1 year ago

For the ESP-IDF framework, it was considered a good idea to deprecate the option CONFIG_ESP32_PHY_CALIBRATION_AND_DATA_STORAGE in favour of the new option CONFIG_ESP_PHY_CALIBRATION_AND_DATA_STORAGE. This option is one of the things that I added to make the "ignore mac CRC" feature work.

So looks like the real fix would be to update ESPHome to configure both the options, so the code will work for both old and new ESP-IDF versions.

mmakaay commented 1 year ago

PR submitted for ESPHome

A PR for ESPhome was submitted for the issue: https://github.com/esphome/esphome/pull/4204 I changed the code to use the correct sdkconfig option, based on the version of the framework that is used for compilation. I tested an upgrade from 2022.11.0 to latest dev with my fix, and that worked correctly.

Steps forward from here

hellcry37 commented 1 year ago

@hellcry37 The device should not be completely dead, no need to buy another. The great thing with the ESP32 is that really bricking it would be quite a feat. One can always flash clean firmware on it to get it going.

It is only my fault I totally brick the lamp because soldered some pins and I broke 2 solder points. Now I dont have an RX and one ground I think, I dont know where else I can get some points for those two.

Just to be safe if I flash second lamp wich is on 2022.11.5 now with this conf will be ok?

# --------------------------------------------------------------------------
# Substitutions
#
# These are substitutions as used by the configuration packages from below.
# You can uncomment and update the ones that you want to modify.
# --------------------------------------------------------------------------

substitutions:
  name: bedside-left-lamp
  friendly_name: 'Bedside Left Lamp'
  light_name: ${friendly_name}
  light_mode_text_sensor_name: ${friendly_name} Light Mode
  default_transition_length: 200ms

# --------------------------------------------------------------------------
# Load configuration packages
#
# These provide a convenient way to compose your device configuration from
# some functional building blocks. Pick and mix the blocks that you need.
#
# For customization you can override options in your config or you can
# copy the contents of these packages directly in your config file as
# an example for your own customizations.
#
# Available packages are:
# - core.yaml                : core components & hardware setup
# - behavior_default.yaml    : default device behavior
# - ota_feedback.yaml        : enable visual feedback during OTA updates
# - activate_preset_svc.yaml : 'activate_preset' service for Home Assistant
# --------------------------------------------------------------------------

packages:
  bslamp2:
    url: https://github.com/mmakaay/esphome-xiaomi_bslamp2
    ref: release/2022.12.0
    files:
      - packages/core.yaml
      - packages/behavior_default.yaml
      - packages/ota_feedback.yaml
      - packages/activate_preset_svc.yaml
    refresh: 0s

# --------------------------------------------------------------------------
# Use your own preferences for these components.
# --------------------------------------------------------------------------

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  domain: .home
  manual_ip:
    static_ip: 192.168.1.33
    gateway: 192.168.1.1
    subnet: 255.255.255.0
    dns1: 192.168.1.2
    dns2: 192.168.1.3
  ap:
    ssid: "${friendly_name}"
    password: !secret default_fallback_ap_pass

api:
  password: !secret bslamp_home_assistant_api_password
  encryption:
    key: !secret bslamp_home_assistant_encryption_key

ota:
  password: !secret bslamp_ota_password
hellcry37 commented 1 year ago

so if i post what pins I f**ed anyone can point out some pics or so where I should get them back so i can serial flash it?

randybb commented 1 year ago

If you broke these big pads and you are not able to trace them (or just take esp32 pinouts), I don't think you will be able to solder to even smaller pads.

hellcry37 commented 1 year ago

i broke the one in pictures yes the big ones, maybe there is a chance to pint them on other parts? I'll try what you gave me this days maybe I manage to fix it

mmakaay commented 1 year ago

If more support is needed on this, please don't follow up in this issue. The issue is specifically about the ESPHome 2022.12.0 breakage, which involves a general issue.

vil1driver commented 1 year ago

Hi, juste to say thanks, my first try today with esphome (2022.12.3) and this bslamp2, and all is fine. big thanks

hellcry37 commented 1 year ago

great for me it botched the second lamp, after trying to update from 2022.11.5 to 2022.12.3 with the config I mention in previous posts I have second lamp down

hellcry37 commented 1 year ago

Updated a working lamp from 2022.11.5 directly to 2022.12.3 with config edited to ref: release/2022.12.0 for me brake the lamp.

I was able to bring it back just by flashing it via serial again with: esp32: framework: sdkconfig_options: CONFIG_FREERTOS_UNICORE: y

and then flash it again with removed / disabled CONFIG_FREERTOS_UNICORE

Progaros commented 1 year ago

same problem here: "phy_init: failed to load RF calibration data" after flashing resulting in bootloop

mmakaay commented 1 year ago

What do the configuration that you have used look like? The fixed new firmware that I tested, and that got confirmed by @vil1driver too, worked for me. The important thing is that the esp32: section of the configuration is conform the example ini from the 2022.12.0 release of the firmware

@Progaros The restore procedure can be found in this message

BTW:

The fix that I did for the current release, is forcing ESPHome to use an older version of the ESP-IDF framework. To support switching to the new framework version, a PR was accepted today for ESPHome. Therefore, the next release of ESPHome will make life a bit better. It should make the firmware compatible with both the older and the newer version of ESP-IDF.

At the same time, I have submitted a feature request for the ESP-IDF framework, which would allow us to fix the underlying issue for once and for all: disabling the CRC check for the burnt-in MAC address, because the bslamp2 devices contain an invalid CRC. The boot loops originate from the ESP-IDF aborting the boot process, because it detects the invalid CRC. The logic being: maybe the correct data can be read after a reboot. Let's hope that this request will be picked up and implemented soon. If peple want to leave likes for this feature quest, here's the link to it: https://github.com/espressif/esp-idf/issues/10401

Progaros commented 1 year ago

thank you so much @mmakaay the restore procedure worked after >10 tries

now everything is back to normal

mmakaay commented 1 year ago

Wow, that took quite some tries. I salute your persistence, @Progaros 😄

szafran81 commented 1 year ago

EDIT: I've finally managed to flash a working firmware on the first lamp. Thenk you for the fix.

Jearde commented 1 year ago

Today I got it to work with the ESPHome Add-On inside RPi Home Assistant, doing the following steps: Base Commit: 116e42542d919f2e0defe358b928453b0bb46db6

  1. Commit out some versioning in /packages/core.yaml
esp32:
  board: esp32doit-devkit-v1
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_FREERTOS_UNICORE: y
    advanced:
      ignore_efuse_mac_crc: true
    # Bugfix for ESPHome 2022.12.0 and up: fallback to older platform
    # version, to prevent bricked devices. ESPHome uses newer versions
    # by default.
    # See also: https://github.com/mmakaay/esphome-xiaomi_bslamp2/issues/104
#     version: 4.3.2
#     source: ~3.40302.0
#     platform_version: platformio/espressif32 @ 3.5.0
  1. Install ESPHome (dev) version in Home Assistant.
  2. Install using Manual Download -> Legacy Format
  3. Flash with esphome-flasher
mmakaay commented 1 year ago

What version of ESPHome does that use then? These changes were specifically made for making things work with the latest ESPHome versions. Commenting them out ought to break things when on 2022.12.0+.

Jearde commented 1 year ago

I had to commit out the versions, because a specified tool chain is not supported for 'linux_aarch64'. This is, however, the case with Home Assistant running on a Raspberry Pi 4 with the provided image from HA. Maybe it works in Docker versions of HA.

First, I got the following error when compiling the ESPHome firmware. Error: Could not find the package with 'espressif/toolchain-xtensa-esp32 @ 8.4.0+2021r2-patch2' requirements for your system 'linux_aarch64' (Not sure about the exact version of xtensa. I didn't save the log. This information is based on my past Google searches.)

After making the changes to the /packages/core.yaml as mentioned above, I got the boot loop as described in 1356893265.

I couldn't use your fix with the manual older versions, because they are not available for my RPi system. After you commented 1363428442 that your merge request was accepted, I changed my ESPHome Add-On to the dev branch inside Home Assistant. I flashed the lamp again and no boot loop or any other problems were present.

What version of ESPHome does that use then? These changes were specifically made for making things work with the latest ESPHome versions. Commenting them out ought to break things when on 2022.12.0+.

EDIT: I used the following version of ESPHome: 034b47c23a08f9980bdae07bcafbcf22fc43dc2e

It is used by the HA add-on: esphome/home-assistant-addon

mmakaay commented 1 year ago

Beware that although my request for a change in the ESP-IDF framework was accepted and the related PR was merged, the change is not yet in the ESP-IDF releases. The next 4.4. and 5. releases of ESP-IDF will likely contain the change.

Unless the change was backported into an already released version of ESP-IDF, don't think that you would be able to benefit from it. Especially since the change would require an additional bit of configuration in the device YAML, to make use of it.

Bottom line: I can't explain why it worked for you, but I'm glad it did ;-)

labodj commented 1 year ago

@mmakaay I just want to report you that your pull request https://github.com/esphome/esphome/pull/4204 has been merged in EspHome 2022.2.0 release https://github.com/esphome/esphome/releases/tag/2023.2.0

mmakaay commented 1 year ago

Yeah, based on that, I can now cook up a new release in which life can be made a little bit better. I also planned to do some documentation updates, now api: password: has been deprecated in favor of api: encryption: key:.

Next big step for a final fix for this stuff will be when my requested change in the ESP-IDF framework (supporting the broken MAC address CRC behaviour of these lamps directly from the ESP-IDF framework) is included in the released framework library version. As I understand, that will be both the next v4.4 and v5 versions of ESP-IDF 🎉 From then on, I don't need the hacky CONFIG_ESP(32)_PHY_CALIBRATION_AND_DATA_STORAGE option anymore.

randybb commented 1 year ago

I am running latest dev without the part about special versions (as I have mentioned in "our" discord thread), so probably it is already there.

hellcry37 commented 1 year ago

this can be closed, as fixed