patrickcollins12 / esphome-fan-controller

ESPHome Fan Controller
431 stars 49 forks source link

constantly reboots #26

Closed Bond246 closed 12 months ago

Bond246 commented 1 year ago

Hello Patrick,

thanks for that repo. It is perfect for my need to cool the technic cupboard for DSL Router, NAS and these things in my livingroom.

To customize the project for my need without homeassistant but with node-red i disabled the HA API and enabled MQTT. I tried to add a 2nd temperature sensor but whatever i did it never had a value. So trying differenz GPIOs Pins, changing the sensor or add a pullup resistance it never worked. But thats not the question.

I think, in general, i havn't changed alot in your code but the scetch is never stable. Sometimes the ESP32 is running for 10 hours, sometimes only for 15min. The reset reason is always Software Reset CPU

Any ideas what the problem could be or how to debug it?

Thanks

patrickcollins12 commented 1 year ago

Does the second sensor work if you swap it into 1st position? Determine if the sensor is working first or if it is a software issue.

I haven't tried mqtt on the device but it should work... seems like an out of memory error. Can you post the logs just prior to failure?

Get the logs via:

$ esphome logs fan.yaml

Feel free to post the yaml here too.

Bond246 commented 1 year ago

Hello patrick,

i've reduced the logs drastically to reduce the systemload which seemed to has an effect on the boot-cycle problem. So i have the same theory that either cpu or memory are being overloaded but i havn't found a way to get statistics about that from the esp32.

This is my configuration:

substitutions:
  friendly_name: Serverschrank Fan

esphome:
  name: server-rack-fan

# Throttle writing parameters to the internal flash memory to reduce ESP memory wear / degradation
preferences:
  flash_write_interval: 15min

#########################
# ESP32 AND NETWORK SETUP

esp32:
  board: nodemcu-32s
  framework:
    type: arduino

# pid climate log update is noisy, dial it back to warn
logger:
  level: DEBUG
  logs: 
    climate: ERROR
    sensor: ERROR
    text_sensor: ERROR
    dht: ERROR
    pulse_counter: ERROR

debug:
  update_interval: 5s

# default HA integration, OTA updater and backup http web portal
# api:
ota:
wifi:

  # Read the wifi/pass from secrets.yaml:
  # wifi_ssid: "My Wifi XX"
  # wifi_password: "XXXXXXX"
  ssid: !secret wifi_ssid
  password: !secret wifi_password

web_server:
  port: 80

mqtt:
  broker: broker
  discovery: false
  username: !secret mqtt_user
  password: !secret mqtt_password

number:

  ## OPTIONAL:
  # RECEIVE KP, KI and KD parameters from input_text.kx helpers in 
  # Home Assistant. See the PID controller below
  # These helper values will get saved to flash thus permanently over-riding 
  # the initial values set in the PID below.

  # KP
  - platform: template
    name: kp
    icon: mdi:chart-bell-curve
    restore_value: true
    initial_value: 0.3
    min_value: 0
    max_value: 50
    step: 0.001
    set_action: 
      lambda: |- 
        id(console_thermostat).set_kp( x );

  # KI
  - platform: template
    name: ki
    icon: mdi:chart-bell-curve
    restore_value: true
    initial_value: 0.0015
    min_value: 0
    max_value: 50
    step: 0.0001
    set_action: 
      lambda: id(console_thermostat).set_ki( x );

  # KD
  - platform: template
    name: kd
    icon: mdi:chart-bell-curve
    restore_value: true
    initial_value: 0.0
    min_value: -50
    max_value: 50
    step: 0.001
    set_action: 
      lambda: id(console_thermostat).set_kd( x );

  # Set threshold low
  - platform: template
    name: Deadband Threshold Low
    icon: mdi:chart-bell-curve
    restore_value: true
    initial_value: -1.0
    min_value: -20
    max_value: 0
    step: 0.1
    set_action: 
      lambda: id(console_thermostat).set_threshold_low( x );

  # Set threshold high
  - platform: template
    name: Deadband Threshold High
    icon: mdi:chart-bell-curve
    restore_value: true
    initial_value: 0.4
    min_value: 0
    max_value: 20
    step: 0.1
    set_action: 
      lambda: id(console_thermostat).set_threshold_high( x );

  # Set ki multiplier
  - platform: template
    name: Deadband ki Multiplier
    icon: mdi:chart-bell-curve
    restore_value: true
    initial_value: 0.04
    min_value: 0
    max_value: .2
    step: 0.01
    set_action: 
      lambda: id(console_thermostat).set_ki_multiplier( x );

text_sensor:

  # Send IP Address
  - platform: wifi_info
    ip_address:
      name: $friendly_name IP Address

  # Send Uptime in raw seconds
  - platform: template
    name: $friendly_name Uptime
    id: uptime_human
    icon: mdi:clock-start

  - platform: debug
    device:
      name: "Device Info"
    reset_reason:
      name: "Reset Reason"

sensor:

  # Send WiFi signal strength & uptime to HA
  - platform: wifi_signal
    name: $friendly_name WiFi Strength
    update_interval: 60s

  # This is a bit of overkill. It sends a human readable 
  # uptime string 1h 41m 32s instead of 6092 seconds
  - platform: uptime
    name: $friendly_name Uptime
    id: uptime_sensor
    update_interval: 30s
    on_raw_value:
      then:
        - text_sensor.template.publish:
            id: uptime_human
            # Custom C++ code to generate the result
            state: !lambda |-
              int seconds = round(id(uptime_sensor).raw_state);
              int days = seconds / (24 * 3600);
              seconds = seconds % (24 * 3600);
              int hours = seconds / 3600;
              seconds = seconds % 3600;
              int minutes = seconds /  60;
              seconds = seconds % 60;
              return (
                (days ? to_string(days) + "d " : "") +
                (hours ? to_string(hours) + "h " : "") +
                (minutes ? to_string(minutes) + "m " : "") +
                (to_string(seconds) + "s")
              ).c_str();

  # Read the Tacho PIN and show measured RPM as a sensor (only with 4-pin PWM fans!)

  - platform: pulse_counter
    pin: 
      number: GPIO18   # Connect to any input PIN on the ESP
      mode: INPUT_PULLUP
    unit_of_measurement: 'RPM'
    id: fan_speed_1
    name: $friendly_name Fan Speed IN
    accuracy_decimals: 0
    filters:
      - multiply: 0.5  # Depending on how many pulses the fan sends per round - should be 0.5 or 1 - try...

  - platform: pulse_counter
    pin: 
      number: GPIO19   # Connect to any input PIN on the ESP
      mode: INPUT_PULLUP
    unit_of_measurement: 'RPM'
    id: fan_speed_2
    name: $friendly_name Fan Speed OUT
    accuracy_decimals: 0
    filters:
      - multiply: 0.5  # Depending on how many pulses the fan sends per round - should be 0.5 or 1 - try...

########################################################
# START THE FAN CONTROLLER SETUP

  - platform: template
    name: $friendly_name p term
    id: p_term
    unit_of_measurement: "%"
    accuracy_decimals: 2

  - platform: template
    name: $friendly_name i term
    id: i_term
    unit_of_measurement: "%"
    accuracy_decimals: 2

  - platform: template
    name: $friendly_name d term
    id: d_term
    unit_of_measurement: "%"
    accuracy_decimals: 2

  - platform: template
    name: $friendly_name output value
    unit_of_measurement: "%"
    id: o_term
    accuracy_decimals: 2

  - platform: template
    name: $friendly_name error value
    id: e_term
    accuracy_decimals: 2

  - platform: template
    name: $friendly_name is in deadband
    id: in_deadband_term
    accuracy_decimals: 0

  # GET TEMP/HUMIDITY FROM DHT22
  - platform: dht
    pin: GPIO26
    model: DHT22
    temperature:
      name: "Temperature"
      id: console_fan_temperature
      accuracy_decimals: 3

      # If you don't smooth the temperature readings 
      # the PID controller over reacts to small changes.
      filters:
         - exponential_moving_average:  
             alpha: 0.1
             send_every: 1

    humidity:
      name: "Luftfeuchte"
      id: console_fan_humidity

    # the DHT11 can only be read every 1s. Use 1.3s to be safe.
    update_interval: 1.3s

  # GET TEMP/HUMIDITY FROM DHT22
  #- platform: dht
  #  pin: GPIO23
  #  model: DHT22
  #  temperature:
  #    name: "Temperature Oben"
  #    id: console_fan_temperature_2
  #    accuracy_decimals: 3

      # If you don't smooth the temperature readings 
      # the PID controller over reacts to small changes.
  #    filters:
  #       - exponential_moving_average:  
  #           alpha: 0.1
  #           send_every: 1

  #  humidity:
  #    name: "Humidity Oben"
  #    id: console_fan_humidity_2

    # the DHT11 can only be read every 1s. Use 1.3s to be safe.
  #  update_interval: 1.3s

  # Take the "COOL" value of the pid and send 
  # it to the frontend to graph the output voltage
  - platform: pid
    name: "Fan Speed (PWM Voltage)"
    climate_id: console_thermostat
    type: COOL

output:
  # Wire this pin (13) into the PWM pin of your 12v fan
  # ledc is the name of the pwm output system on an esp32
  - platform: ledc
    id: console_fan_speed
    pin: GPIO23

    # 25KHz is standard PC fan frequency, minimises buzzing
    frequency: "25000 Hz" 

    # my fans stop working below 13% powerful.
    # also they're  powerful and loud, cap their max speed to 80%
    min_power: 13%
    max_power: 80%

# Good for debugging, you can manually set the fan 
# speed. Just make sure the Climate device is set to off or it will keep getting overridden.
# fan:
#  - platform: speed
#      output: console_fan_speed
#      name: "Console Fan Speed"

# Expose a PID-controlled Thermostat
# Manual: https://esphome.io/components/climate/pid.html
climate:
  - platform: pid
    name: "Console Fan Thermostat"
    id: console_thermostat
    sensor: console_fan_temperature

    # It is summer right now, so 30c is a decent target.
    default_target_temperature: 30°C
    cool_output: console_fan_speed

    # ON state change, publish the values to the x_term numbers defined 
    # above, so that they can be viewed in HA
    on_state:
      - sensor.template.publish:
          id: p_term
          state: !lambda 'return -id(console_thermostat).get_proportional_term() * 100.0;'
      - sensor.template.publish:
          id: i_term
          state: !lambda 'return -id(console_thermostat).get_integral_term()* 100.0;'
      - sensor.template.publish:
          id: d_term
          state: !lambda 'return -id(console_thermostat).get_derivative_term()* 100.0;'
      - sensor.template.publish:
          id: o_term
          state: !lambda 'return -id(console_thermostat).get_output_value()* 100.0;'
      - sensor.template.publish:
          id: in_deadband_term
          state: !lambda 'return id(console_thermostat).in_deadband();'
      - sensor.template.publish:
          id: e_term
          state: !lambda 'return -id(console_thermostat).get_error_value();'

    # The extents of the HA Thermostat
    visual:
      min_temperature: 20 °C
      max_temperature: 50 °C

    # See the README for setting up these parameters.
    # These are over ridden by the number templates above.
    control_parameters:
      kp: 0.3
      ki: 0.0015
      kd: 0
      max_integral: 0.0
      output_averaging_samples: 1
      derivative_averaging_samples: 5

    # How to behave when close to the target temperature?
    deadband_parameters:
      threshold_high: 0.4°C
      threshold_low: -1.0°C
      kp_multiplier: 0.0
      ki_multiplier: 0.04
      kd_multiplier: 0.0
      deadband_output_averaging_samples: 15

switch:
  # Expose an ESP32 restart button to HA
  - platform: restart
    name: ${friendly_name} ESP32 Restart
    id: console_fan_restart

# Restart every day at 12:30am. 
# I've had some memory issues lockup 
# the device after a couple weeks
#time:
#  - platform: homeassistant
#    on_time:
#      # Every morning at 12:30am
#    - seconds: 0
#      minutes: 30
#      hours: 0
#      then:
#       - switch.turn_on: console_fan_restart

# I was able to find good KP,KI,KD values manually, per the instructions,
# but you can try pressing the autotune button from home assistant and copying the 
# values it produces. 
# See more at: https://esphome.io/components/climate/pid.html#climate-pid-autotune-action
button:
- platform: template
  name: "PID Climate Autotune"
  on_press: 
    - climate.pid.autotune: console_thermostat

The 2nd DHT-Sensor begins in line 271. And i've chosen DHT22 instead of DHT11. Now i've done the project without the 2nd sensor, so that is not very important for me now to solve this. But yes... It was not a problem regarding to the sensor hardware... Moving the 2nd sensor to the first position changed nothing. The 1st one works the 2nd not. Note the cable is ~30cm long. On my breadboard it was ~2cm.

Here some of the logs. Beware that i copy that from grafana and newest is first, oldest last. Before disabling the sensor-logging there was a lot of logging like

[W][component:215]: Components should block for at most 20-30ms.
[W][component:214]: Component dht.sensor took a long time for an operation (0.09 s).
[D][sensor:094]: 'Luftfeuchte': Sending state 35.80000 % with 0 decimals of accuracy
[D][sensor:094]: 'Serverschrank Fan error value': Sending state -1.85168  with 2 decimals of accuracy
[D][sensor:094]: 'Serverschrank Fan is in deadband': Sending state 0.00000  with 0 decimals of accuracy
[D][sensor:094]: 'Serverschrank Fan output value': Sending state -25.55031 % with 2 decimals of accuracy
[D][sensor:094]: 'Serverschrank Fan d term': Sending state 0.00000 % with 2 decimals of accuracy
[D][sensor:094]: 'Serverschrank Fan i term': Sending state -0.00000 % with 2 decimals of accuracy
[D][sensor:094]: 'Serverschrank Fan p term': Sending state -25.55031 % with 2 decimals of accuracy
[D][sensor:094]: 'Fan Speed (PWM Voltage)': Sending state 0.00000 % with 1 decimals of accuracy
[D][sensor:094]: 'Temperature': Sending state 28.14832 °C with 3 decimals of accuracy
[D][sensor:094]: 'Luftfeuchte': Sending state 35.80000 % with 0 decimals of accuracy
[D][sensor:094]: 'Fan Speed (PWM Voltage)': Sending state 0.00000 % with 1 decimals of accuracy
[D][sensor:094]: 'Temperature': Sending state 28.14258 °C with 3 decimals of accuracy
[W][component:215]: Components should block for at most 20-30ms.
[W][component:214]: Component dht.sensor took a long time for an operation (0.09 s).

And in case of reboot just this:

[D][text_sensor:064]: 'Reset Reason': Sending state 'Software Reset CPU'
[D][text_sensor:064]: 'Device Info': Sending state '2023.10.6|Flash: 4096kB Speed:40MHz Mode:DIO|Chip: ESP32 Features:WIFI_BGN,BLE,BT, Cores:2 Revision:3|ESP-IDF: v4.4.2|EFuse MAC: C8:xx:xx:xx:xx|Reset: Software Reset CPU|Wakeup: Unknown'
[D][debug:317]: Wakeup Reason: Unknown
[D][debug:272]: Reset Reason: Software Reset CPU
[D][debug:172]: EFuse MAC: C8:C9:A3:FC:AF:D4
[D][debug:167]: ESP-IDF Version: v4.4.2
[D][debug:159]: Chip: Model=ESP32, Features=WIFI_BGN,BLE,BT, Cores=2, Revision=3
[D][debug:110]: Flash Chip: Size=4096kB Speed=40MHz Mode=DIO
[D][debug:080]: Free Heap Size: 180296 bytes
[D][debug:076]: ESPHome version 2023.10.6
[I][app:102]: ESPHome version 2023.10.6 compiled on Nov  3 2023, 11:25:15
[D][text_sensor:064]: 'Serverschrank Fan IP Address': Sending state '192.168.171.12'
[D][sensor:094]: 'Serverschrank Fan WiFi Strength': Sending state -40.00000 dBm with 0 decimals of accuracy
[I][app:062]: setup() finished successfully!
[I][mqtt:274]: MQTT Connected!
---
[D][sensor:094]: 'Luftfeuchte': Sending state 36.00000 % with 0 decimals of accuracy
[D][sensor:094]: 'Fan Speed (PWM Voltage)': Sending state 0.00000 % with 1 decimals of accuracy
[D][sensor:094]: 'Temperature': Sending state 28.10348 °C with 3 decimals of accuracy
[D][sensor:094]: 'Luftfeuchte': Sending state 36.00000 % with 0 decimals of accuracy
[D][sensor:094]: 'Fan Speed (PWM Voltage)': Sending state 0.00000 % with 1 decimals of accuracy
[D][sensor:094]: 'Temperature': Sending state 28.10387 °C with 3 decimals of accuracy

I made a dashed line --- where the reboot might had been. So there was a log of the last sensor-message and the next message is "mqtt connected" after the reboot.

patrickcollins12 commented 1 year ago

When you say the second sensor doesn't work, what exactly are you expecting it to do? Does it log any values? Or are you expecting it to do something to the climate pid?

The instructions for dht22 suggest that you definitely need a pull-up resistor. I haven't needed one but maybe you do, and maybe for some reason having two is what triggers needing it.

https://esphome.io/components/sensor/dht.html

Once you get two temp sensors working what will you do with the second one? Do you want an average of both sensors to drive one fan? Or will the second sensor drive a separate fan?

patrickcollins12 commented 1 year ago

Regarding memory usage.

Try disabling components, in particular disable the web server which is resource intensive.

Start disabling some of the components I setup for debug. In particular the on_state and "platform: template" components. They're technically only needed for your setup and debug anyway.

Last resort, if you disable ota you would also free up resources, but then you'd have to update from usb cable.

What kind of esp32 are you running? Different models have different ram.

patrickcollins12 commented 1 year ago

Also, is there a reason you need Mqtt rather than the HA api? The latter is more suitable and compact if the data is going to end up in HA anyway.

Bond246 commented 1 year ago

Thanks for your replay,

When you say the second sensor doesn't work, what exactly are you expecting it to do? Does it log any values? Or are you expecting it to do something to the climate pid?

The instructions for dht22 suggest that you definitely need a pull-up resistor. I haven't needed one but maybe you do, and maybe for some reason having two is what triggers needing it.

https://esphome.io/components/sensor/dht.html

Once you get two temp sensors working what will you do with the second one? Do you want an average of both sensors to drive one fan? Or will the second sensor drive a separate fan?

First step of integration was just to get a temperature-value. Next step would had been to use an average of both values as input for the PID. But i never get a value. Always just %NaN... I tried it with and without the pullup without any difference. The system is already integrated into its place so actually i don't have plans to add the 2nd sensor again. And in general it works well with one sensor.

My board is a DEBO JT ESP32 with Tensilica LX6 Dual-Core, 512kB SRAM and 4096MB ROM. https://www.reichelt.de/nodemcu-esp32-wifi-und-bluetooth-modul-debo-jt-esp32-p219897.html?search=DEBO+JT+ESP32 I will try to disable some components as suggested. How can i reboot the chip when i disabled the webserver? Actually i'm doing a POST request on /switch/serverschrank_fan_esp32_restart/turn_on

Also, is there a reason you need Mqtt rather than the HA api? The latter is more suitable and compact if the data is going to end up in HA anyway.

I don't have HA in my home-automation setup. I'm doing everything with Node-RED. All HA integration i found for Node-RED are really bad.

patrickcollins12 commented 1 year ago

Well the web server is very large and I've seen others having memory issues with it. So I suspect that is your problem. Try disabling it. If that solves the issue, maybe there is another solution for remote reboot, like mqtt.

Bond246 commented 1 year ago

Thanks so far.

I've tested another esphome package for Node-RED that uses the api. But it seems to that the connection is not stable and i get a lot of reconnects on the API. In parallel i disabled the webinterface. Both options didn't changed anything.

So the next round:

Now i need to wait another day (or two) to see how stable it is.

Next step would be to switch from mqtt to api which means no mqtt but api. Last test was mqtt and api in parallel.

DunklesKaltesNichts commented 1 year ago

Have you tried setting the timeout value to 0?

reboot_timeout (Optional, Time):

https://esphome.io/components/api.html#configuration-variables

the default setting is 15 minutes and is also available for mqtt and wifi.

Bond246 commented 1 year ago

I've seen this option but acutally the reboots came later than 15min. So this is not the problem in general but maybe a side-effect.

If the problem stil exists with my acutal configuration i will switch from mqtt to api and will test this again. Thanks.

DunklesKaltesNichts commented 1 year ago

If the connection to the API is lost after 4 hours, the ESP reboots after 4 hours 15 mins.

Bond246 commented 1 year ago

So since my last changes in reducing load it seems actually much more stable. Within the last two days there where 6 reboots. Sometimes after ~22 hourse. Sometimes already after 2 hours or 30mins.

In between i checked to use the api instead of mqtt which was an epic fail in my case. After some minutes the api connection is lost then hundreds of reconnects are following and at the end of the day i have to restart all components to get it working again. I can only speculate the reasons for that. It is very likely that the esphome implementation for node-red is not really stable. But in some cases i thought that the esp32 is struggeling and the api connection is getting fuzzy. And further on the components try to reconnect which increases the struggles.

So at the end of the day this will not solve my problem.

Patrick you wrote that every platform-template section could be disabled. You mean also the sensor components or just specific template components?

Thanks so far!

patrickcollins12 commented 12 months ago

Since this seems to be an issue more with esphome memory and probably using mqtt I suggest joining the esphome discord server where they all hang out and you'll get a lot more ideas.