ratgdo / esphome-ratgdo

ratgdo for ESPHome
GNU General Public License v2.0
359 stars 108 forks source link

How to troubleshoot an apparent crash? #113

Open alexruffell opened 1 year ago

alexruffell commented 1 year ago

I've found my RATGDO in an unresponsive state multiple times and am having a hard time figuring out what the issue is. I have 2 ESP32 devices in my garage and the RATGDO is closest to a Unifi AP with just one wall between them. When it is unresponsive, it is not connected to the network either and being mounted up high, I can't connect to it with a wire to see if there are any logs.

I have a RATGDO 2.0 but made a tiny mod consisting in removing one of the mosfets on the Light status output so that I could control a more powerful mosfet that turns 2 parking lasers on and off. The code is nearly stock, with just a few changes I marked with #AMR.

Any suggestions on how I can troubleshoot this? I can't exclude a Unifi related issue but with about 40 operational ESPs, this is the only one crashing.

Would be nice if I could use another ESP32 to read the log and relay it... is that even possible?

substitutions:
  devicename: "overhead-garage-door"
  id_prefix: "ogdo"
  friendly_devicename: "Overhead Garage Door"
  device_description: "Overhead Garage Door"
  #friendly_name: "Overhead Garage Door"
  uart_tx_pin: "16"
  uart_rx_pin: "33"
  input_obst_pin: "11"
  status_door_pin: "5"
  status_obstruction_pin: "12"
  dry_contact_open_pin: "7"
  dry_contact_close_pin: "9"
  dry_contact_light_pin: "18"

web_server:

esphome:
  name: ${devicename}
  comment: ${device_description}
  friendly_name: ${friendly_devicename}
  project:
    name: ratgdo.esphome
    version: "2.0"

esp32:
  board: lolin_s2_mini

dashboard_import:
  package_import_url: github://ratgdo/esphome-ratgdo/v2board_esp32_lolin_s2_mini.yaml@main

# Copied base.yaml below to make adjustments
# packages:
#   remote_package:
#     url: https://github.com/ratgdo/esphome-ratgdo
#     files: [base.yaml]
#     refresh: 1s

# Sync time with Home Assistant.
time:
  - platform: homeassistant
    id: homeassistant_time

api:
  id: api_server
  encryption:
    key: "redacted"

ota:
  password: !secret ota_pwd

improv_serial:

wifi:
  ssid: !secret iot_wifi_ssid
  password: !secret iot_wifi_password
  power_save_mode: none

#Faster than DHCP. Also use if can't reach because of name change
  manual_ip:
    static_ip: 192.168.3.235
    gateway: 192.168.3.1
    subnet: 255.255.255.0
    dns1: 192.168.1.35
    dns2: 192.168.1.36

#Manually override what address to use to connect to the ESP.
#Defaults to auto-generated value. Example, if you have changed your
#static IP and want to flash OTA to the previously configured IP address.
  use_address: 192.168.3.235

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "${devicename}"
    password: !secret iot_wifi_password

logger:

#base.yaml starts here (with adjustments I made)
external_components:
  - source:
      type: git
      url: https://github.com/ratgdo/esphome-ratgdo
    refresh: 1s

preferences:
  flash_write_interval: 5s

ratgdo:
  id: ${id_prefix}
  input_gdo_pin: ${uart_rx_pin}
  output_gdo_pin: ${uart_tx_pin}
  input_obst_pin: ${input_obst_pin}
  on_sync_failed:
    then:
      - homeassistant.service:
          service: persistent_notification.create
          data:
            title: "${friendly_name} sync failed"
            message: "Failed to communicate with garage opener on startup; Check the ${friendly_name} Rolling code counter number entity history and set the entity to one number larger than the largest value in history. [ESPHome devices](/config/devices/dashboard?domain=esphome)"
            notification_id: "esphome_ratgdo_${id_prefix}_sync_failed"

sensor:
  - platform: ratgdo
    id: ${id_prefix}_openings
    type: openings
    entity_category: diagnostic
    ratgdo_id: ${id_prefix}
    name: "Openings"
    unit_of_measurement: "openings"
    icon: mdi:open-in-app

switch:
  - platform: ratgdo
    id: ${id_prefix}_lock_remotes
    type: lock
    entity_category: config
    ratgdo_id: ${id_prefix}
    name: "Lock remotes"
  - platform: gpio
    id: "${id_prefix}_status_door"
    internal: true
    pin:
      number: ${status_door_pin}  # D0 output door status, HIGH for open, LOW for closed
      mode:
        output: true
    name: "Status door"
    entity_category: diagnostic
  - platform: gpio
    id: "${id_prefix}_status_obstruction"
    internal: true
    pin:
      number: ${status_obstruction_pin}  # D8 output for obstruction status, HIGH for obstructed, LOW for clear
      mode:
        output: true
    name: "Status obstruction"
    entity_category: diagnostic

binary_sensor:
  #AMR - Exposes the node state
  - platform: status
    name: "Connection Status"
    id: connection_status
    entity_category: diagnostic

  - platform: ratgdo
    type: motion
    id: ${id_prefix}_motion
    ratgdo_id: ${id_prefix}
    name: "Motion"
    device_class: motion
  - platform: ratgdo
    type: obstruction
    id: ${id_prefix}_obstruction
    ratgdo_id: ${id_prefix}
    name: "Obstruction"
    device_class: problem
    on_press:
      - switch.turn_on: ${id_prefix}_status_obstruction
    on_release:
      - switch.turn_off: ${id_prefix}_status_obstruction
  - platform: ratgdo
    type: button
    id: ${id_prefix}_button
    ratgdo_id: ${id_prefix}
    name: "Button"
    entity_category: diagnostic
  - platform: ratgdo
    type: motor
    id: ${id_prefix}_motor
    ratgdo_id: ${id_prefix}
    name: "Motor"
    device_class: running
    entity_category: diagnostic
  - platform: gpio
    id: "${id_prefix}_dry_contact_open"
    pin:
      number: ${dry_contact_open_pin}  # D5 dry contact for opening door
      inverted: true
      mode:
        input: true
        pullup: true
    name: "Dry contact open"
    entity_category: diagnostic
    filters:
      - delayed_on_off: 500ms
    on_press:
      - if:
          condition:
            binary_sensor.is_off: ${id_prefix}_dry_contact_close
          then:
            - cover.open: ${id_prefix}_garage_door
  - platform: gpio
    id: "${id_prefix}_dry_contact_close"
    pin:
      number: ${dry_contact_close_pin}  # D6 dry contact for closing door
      inverted: true
      mode:
        input: true
        pullup: true
    name: "Dry contact close"
    entity_category: diagnostic
    filters:
      - delayed_on_off: 500ms
    on_press:
      - if:
          condition:
            binary_sensor.is_off: ${id_prefix}_dry_contact_open
          then:
            - cover.close: ${id_prefix}_garage_door
  - platform: gpio
    id: "${id_prefix}_dry_contact_light"
    pin:
      number: ${dry_contact_light_pin}  # D3 dry contact for triggering light (no discrete light commands, so toggle only)
      inverted: true
      mode:
        input: true
        pullup: true
    name: "Dry contact light"
    entity_category: diagnostic
    filters:
      - delayed_on_off: 500ms
    on_press:
      - light.toggle: ${id_prefix}_light

number:
  - platform: ratgdo
    id: ${id_prefix}_rolling_code_counter
    type: rolling_code_counter
    entity_category: config
    ratgdo_id: ${id_prefix}
    name: "Rolling code counter"
    mode: box
    unit_of_measurement: "codes"

  - platform: ratgdo
    id: ${id_prefix}_opening_duration
    type: opening_duration
    entity_category: config
    ratgdo_id: ${id_prefix}
    name: "Opening duration"
    unit_of_measurement: "s"

  - platform: ratgdo
    id: ${id_prefix}_closing_duration
    type: closing_duration
    entity_category: config
    ratgdo_id: ${id_prefix}
    name: "Closing duration"
    unit_of_measurement: "s"

  - platform: ratgdo
    id: ${id_prefix}_client_id
    type: client_id
    entity_category: config
    ratgdo_id: ${id_prefix}
    name: "Client ID"
    mode: box

cover:
  - platform: ratgdo
    id: ${id_prefix}_garage_door
    device_class: garage
    #AMR - Remove Door from name
    name: " "
    ratgdo_id: ${id_prefix}
    on_closed:
      - switch.turn_off: ${id_prefix}_status_door
    #AMR - on opening
    on_opening:
      - switch.turn_on: ${id_prefix}_status_door

light:
  - platform: ratgdo
    id: ${id_prefix}_light
    name: "Light"
    ratgdo_id: ${id_prefix}

button:
  - platform: restart
    name: "Restart"
    #AMR - make it diagnostic
    entity_category: diagnostic

  - platform: safe_mode
    name: "Safe mode boot"
    entity_category: diagnostic

  - platform: template
    id: ${id_prefix}_query_status
    entity_category: diagnostic
    name: "Query status"
    on_press:
      then:
        lambda: !lambda |-
          id($id_prefix).query_status();

  - platform: template
    id: ${id_prefix}_query_openings
    name: "Query openings"
    entity_category: diagnostic
    on_press:
      then:
        lambda: !lambda |-
          id($id_prefix).query_openings();

  - platform: template
    id: ${id_prefix}_sync
    name: "Sync"
    entity_category: diagnostic
    on_press:
      then:
        lambda: !lambda |-
          id($id_prefix).sync();

  - platform: template
    id: ${id_prefix}_toggle_door
    name: "Toggle door"
    on_press:
      then:
        lambda: !lambda |-
          id($id_prefix).toggle_door();
mariusmuja commented 1 year ago

In cases like this the best bet is to hook up a cable and save the serial logs, it's likely the only place you would find see a stack dump in case of a crash. Because it's mounted up high this can be tricky, maybe a solution would be to use a small SBC board (raspberry pi or similar) connected to it and pipe the logs to a file until the next crash.

If you manage to capture a stack trace, using EspExceptionDecoder could allow you to get some insight into the cause of the crash.

iridris commented 1 year ago

I've experienced this a few times as well. Similar setup, I'm running a Unifi AP and have a Ratgdo 2.0 board.

I have noticed that when the Ratgdo is in this unresponsive state, opening/closing the garage door using one of the remotes (or wall switch) will sometimes release it from whatever state it is in and start reporting back to HA again.

alexruffell commented 1 year ago

Next time it happens I will test that and report back.

chriscrowe commented 11 months ago

Anyone make progress on this? I've noticed mine often drops completely off the network after I open the garage, sometimes with the wall button and sometimes through HA, I've seen it in both cases.

Hard to debug when it happens because in order to plug in USB for serial logs I have to pull the existing one which powers it down and presumably scrubs the logs.

alexruffell commented 8 months ago

I am still having the unresponsive / unavailable (probably crashed issue). Anyone figured out the cause?