syssi / esphome-jk-bms

ESPHome component to monitor and control a Jikong Battery Management System (JK-BMS) via UART-TTL or BLE
Apache License 2.0
405 stars 134 forks source link

Issue with multiple BMS over BLE & MQTT #414

Open luckylinux opened 5 months ago

luckylinux commented 5 months ago

My goal was to put both BMS I have (JK BMS HW v11 & v10) for a 16s setup over MQTT using the esphome-jk-bms tool.

To recap:

I did that first for jk-bms-bat02. Everything worked. I get continuous MQTT messages every 10s or so.

This morning I flashed jk-bms-bat01 using the same config (minus the different hostname & MAC address of the BMS) and ... well ... both jk-bms-bat01 and jk-bms-bat02 sent ONE SINGLE SET of messages (all cell voltages, cell resistances, ...) AND THEN NOTHING.

When I programmed jk-bms-bat01 this morning it seemed even finicky and the reboot loop afterwards threw yet another error message (cannot remember what exactly)

It's been over an hour and they are not sending any more responses.

What could be the cause of this ?

I can oberse this:

Possible causes:

Both BMS are quite close together (~ 30 cm). Both ESP32 are quite close together (~ 20 cm).

Wifi Access Point is some Hi-Power Alfa Wireless Adapter running HostAPd on Linux. The reception should be very good.

My esp32-ble.yaml file

substitutions:
  name: !secret host_name
  device_description: "Monitor and control a JK-BMS via bluetooth"
  external_components_source: github://syssi/esphome-jk-bms@main
  mac_address: !secret jk_mac_address
  # Defaults to "JK02" (hardware version >= 6.0 and < 11.0)
  # Please use "JK02_32S" if you own a new JK-BMS >= hardware version 11.0 (f.e. JK-B2A8S20P hw 11.XW, sw 11.26)
  # Please use "JK04" if you have some old JK-BMS <= hardware version 3.0 (f.e. JK-B2A16S hw 3.0, sw. 3.3.0)
  protocol_version: !secret jk_protocol

esphome:
  name: ${name}
  platformio_options:
    build_flags:
      - -DCONFIG_ARDUINO_LOOP_STACK_SIZE=32768
  comment: ${device_description}
  project:
    name: "syssi.esphome-jk-bms"
    version: 1.5.0

esp32:
  #board: wemos_d1_mini32
  board: nodemcu-32s
  framework:
    type: esp-idf
  #  version: latest

external_components:
  - source: ${external_components_source}
    refresh: 0s

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

ota:

logger:
  level: DEBUG

# If you don't use Home Assistant please remove this `api` section and uncomment the `mqtt` component!
#api:

mqtt:
  broker: !secret mqtt_host
  port: !secret mqtt_port
  username: !secret mqtt_username
  password: !secret mqtt_password
  id: mqtt_client
  skip_cert_cn_check: true
#  birth_message:
#    topic: availability/topic
#    payload: online
#  will_message:
#    topic: availability/topic
#    payload: offline

esp32_ble_tracker:
  on_ble_advertise:
    then:
      - lambda: |-
          if (x.get_name().rfind("JK-", 0) == 0) {
            ESP_LOGI("ble_adv", "New JK-BMS found");
            ESP_LOGI("ble_adv", "  Name: %s", x.get_name().c_str());
            ESP_LOGI("ble_adv", "  MAC address: %s", x.address_str().c_str());
            ESP_LOGD("ble_adv", "  Advertised service UUIDs:");
            for (auto uuid : x.get_service_uuids()) {
              ESP_LOGD("ble_adv", "    - %s", uuid.to_string().c_str());
            }
          }

ble_client:
  - mac_address: ${mac_address}
    id: client0

jk_bms_ble:
  - ble_client_id: client0
    protocol_version: ${protocol_version}
    throttle: 5s
    id: bms0

binary_sensor:
  - platform: jk_bms_ble
    balancing:
      name: "${name} balancing"
    charging:
      name: "${name} charging"
    discharging:
      name: "${name} discharging"
    online_status:
      name: "${name} online status"

button:
  - platform: jk_bms_ble
    retrieve_settings:
      name: "${name} retrieve settings"
    retrieve_device_info:
      name: "${name} retrieve device info"

number:
  - platform: jk_bms_ble
    jk_bms_ble_id: bms0
    balance_trigger_voltage:
      name: "${name} balance trigger voltage"
    cell_count:
      name: "${name} cell count"
    total_battery_capacity:
      name: "${name} total battery capacity"
    cell_voltage_overvoltage_protection:
      name: "${name} cell voltage overvoltage protection"
    cell_voltage_overvoltage_recovery:
      name: "${name} cell voltage overvoltage recovery"
    cell_voltage_undervoltage_protection:
      name: "${name} cell voltage undervoltage protection"
    cell_voltage_undervoltage_recovery:
      name: "${name} cell voltage undervoltage recovery"
    balance_starting_voltage:
      name: "${name} balance starting voltage"
    voltage_calibration:
      name: "${name} voltage calibration"
    current_calibration:
      name: "${name} current calibration"
    power_off_voltage:
      name: "${name} power off voltage"
    max_balance_current:
      name: "${name} max balance current"
    max_charge_current:
      name: "${name} max charge current"
    max_discharge_current:
      name: "${name} max discharge current"

sensor:
  - platform: jk_bms_ble
    jk_bms_ble_id: bms0
    min_cell_voltage:
      name: "${name} min cell voltage"
    max_cell_voltage:
      name: "${name} max cell voltage"
    min_voltage_cell:
      name: "${name} min voltage cell"
    max_voltage_cell:
      name: "${name} max voltage cell"
    delta_cell_voltage:
      name: "${name} delta cell voltage"
    average_cell_voltage:
      name: "${name} average cell voltage"
    cell_voltage_1:
      name: "${name} cell voltage 1"
    cell_voltage_2:
      name: "${name} cell voltage 2"
    cell_voltage_3:
      name: "${name} cell voltage 3"
    cell_voltage_4:
      name: "${name} cell voltage 4"
    cell_voltage_5:
      name: "${name} cell voltage 5"
    cell_voltage_6:
      name: "${name} cell voltage 6"
    cell_voltage_7:
      name: "${name} cell voltage 7"
    cell_voltage_8:
      name: "${name} cell voltage 8"
    cell_voltage_9:
      name: "${name} cell voltage 9"
    cell_voltage_10:
      name: "${name} cell voltage 10"
    cell_voltage_11:
      name: "${name} cell voltage 11"
    cell_voltage_12:
      name: "${name} cell voltage 12"
    cell_voltage_13:
      name: "${name} cell voltage 13"
    cell_voltage_14:
      name: "${name} cell voltage 14"
    cell_voltage_15:
      name: "${name} cell voltage 15"
    cell_voltage_16:
      name: "${name} cell voltage 16"
    cell_voltage_17:
      name: "${name} cell voltage 17"
    cell_voltage_18:
      name: "${name} cell voltage 18"
    cell_voltage_19:
      name: "${name} cell voltage 19"
    cell_voltage_20:
      name: "${name} cell voltage 20"
    cell_voltage_21:
      name: "${name} cell voltage 21"
    cell_voltage_22:
      name: "${name} cell voltage 22"
    cell_voltage_23:
      name: "${name} cell voltage 23"
    cell_voltage_24:
      name: "${name} cell voltage 24"
    cell_resistance_1:
      name: "${name} cell resistance 1"
    cell_resistance_2:
      name: "${name} cell resistance 2"
    cell_resistance_3:
      name: "${name} cell resistance 3"
    cell_resistance_4:
      name: "${name} cell resistance 4"
    cell_resistance_5:
      name: "${name} cell resistance 5"
    cell_resistance_6:
      name: "${name} cell resistance 6"
    cell_resistance_7:
      name: "${name} cell resistance 7"
    cell_resistance_8:
      name: "${name} cell resistance 8"
    cell_resistance_9:
      name: "${name} cell resistance 9"
    cell_resistance_10:
      name: "${name} cell resistance 10"
    cell_resistance_11:
      name: "${name} cell resistance 11"
    cell_resistance_12:
      name: "${name} cell resistance 12"
    cell_resistance_13:
      name: "${name} cell resistance 13"
    cell_resistance_14:
      name: "${name} cell resistance 14"
    cell_resistance_15:
      name: "${name} cell resistance 15"
    cell_resistance_16:
      name: "${name} cell resistance 16"
    cell_resistance_17:
      name: "${name} cell resistance 17"
    cell_resistance_18:
      name: "${name} cell resistance 18"
    cell_resistance_19:
      name: "${name} cell resistance 19"
    cell_resistance_20:
      name: "${name} cell resistance 20"
    cell_resistance_21:
      name: "${name} cell resistance 21"
    cell_resistance_22:
      name: "${name} cell resistance 22"
    cell_resistance_23:
      name: "${name} cell resistance 23"
    cell_resistance_24:
      name: "${name} cell resistance 24"
    total_voltage:
      name: "${name} total voltage"
    current:
      name: "${name} current"
    power:
      name: "${name} power"
    charging_power:
      name: "${name} charging power"
    discharging_power:
      name: "${name} discharging power"
    temperature_sensor_1:
      name: "${name} temperature sensor 1"
    temperature_sensor_2:
      name: "${name} temperature sensor 2"
    power_tube_temperature:
      name: "${name} power tube temperature"
    state_of_charge:
      name: "${name} state of charge"
    capacity_remaining:
      name: "${name} capacity remaining"
    total_battery_capacity_setting:
      name: "${name} total battery capacity setting"
    charging_cycles:
      name: "${name} charging cycles"
    total_charging_cycle_capacity:
      name: "${name} total charging cycle capacity"
    total_runtime:
      name: "${name} total runtime"
    balancing_current:
      name: "${name} balancing current"
    errors_bitmask:
      name: "${name} errors bitmask"

switch:
  - platform: jk_bms_ble
    charging:
      name: "${name} charging"
    discharging:
      name: "${name} discharging"
    balancer:
      name: "${name} balancer"

  - platform: ble_client
    ble_client_id: client0
    name: "${name} enable bluetooth connection"

text_sensor:
  - platform: jk_bms_ble
    errors:
      name: "${name} errors"
    total_runtime_formatted:
      name: "${name} total runtime formatted"

My build script (to be run as root or possibly with sudo):

#!/bin/bash

# Define ESPHome configuration file
esphomeconfig="esp32-ble.yaml"

# Store current path
currentpath=$(pwd)

# Build folder
buildpath="$HOME/ESPHome"
mkdir -p $buildpath
cd $buildpath

# Create venv
apt-get -y install python3.11-venv python3-venv
python3 -m venv ./venv

# Active venv
source venv/bin/activate

# Install esphome
pip3 install esphome

# Clone this external component
git clone https://github.com/syssi/esphome-jk-bms.git
cd esphome-jk-bms

# Configurations
names=()
names+=("jk-bms-bat01")
names+=("jk-bms-bat02")

macs=()
macs+=("C8:47:8C:EC:1E:60")
macs+=("C8:47:8C:E5:98:96")

# Please "JK02" (hardware version >= 6.0 and < 11.0)
# Please use "JK02_32S" if you own a new JK-BMS >= hardware version 11.0 (f.e. JK-B2A8S20P hw 11.XW, sw 11.26)
# Please use "JK04" if you have some old JK-BMS <= hardware version 3.0 (f.e. JK-B2A16S hw 3.0, sw. 3.3.0)
protocols=()
protocols+=("JK02_32S") # HW v11
protocols+=("JK02") # HW v10

num=${#names[@]}
maxindex=$(($num - 1))
selected=$((-1))

while [[ $selected -gt $maxindex ]] || [[ $selected -lt 0 ]]
do
    for ((index=0;index<=$maxindex;index++))
    do
        echo -e "[${index}]"
        echo -e "\t Hostname: ${names[${index}]}"
        echo -e "\t MAC Address: ${macs[${index}]}"
        echo -e "\t Protocol: ${protocols[${index}]}"
    done

    read -p "Enter desired configuration: " selected
    name=${names[$selected]}
    mac=${macs[$selected]}
    protocol=${protocols[$selected]}
done

echo "Hostname set to <$name>"
echo "MAC Addres set to <$mac>"

# Create a secrets.yaml containing some setup specific secrets
cat > secrets.yaml <<EOF
host_name: $name

wifi_ssid: XXXXXXXXXXXXXXXXXXXX
wifi_password: XXXXXXXXXXXXXXXXXX

jk_mac_address: $mac
jk_protocol: $protocol

mqtt_host: 192.168.X.Y
mqtt_port: 1883
mqtt_username: ""
mqtt_password: ""
EOF

# Validate the configuration, create a binary, upload it, and start logs
# If you use a esp8266 run the esp8266-examle.yaml
cp $currentpath/$esphomeconfig ./$esphomeconfig
esphome run $esphomeconfig

# Change back to currentpath
cd $currentpath

It's actually quite weird that it worked in the first place, because the access point is NOT supposed to have routing enabled between interfaces. And the subnet the Wifi Adapters should receive is 172.22.1.1/16 actually. Maybe mqtt_host: 172.22.1.1 would work better.

But why did it work in the first place ? And why it's not working with 2 adapters now ?

syssi commented 5 months ago

Please attach one of the ESPs via USB to your PC and take a look at the local log book using:

esphome logs esp32-ble.yaml

Do you see any warnings? Please provide these logs. In best case one per ESP. All different root causes should be visible here:

  1. Boot loop
  2. Unable to setup the WiFi connection
  3. Unable to establish a connection to the MQTT broker
  4. Unable to connect to the BMS using BLE
  5. ...
luckylinux commented 5 months ago

Here.

bat02_log_20240115.log bat01_log_20240115.log

It seems jk-bms-bat01 never really connects to wifi though.

After unplugging & replugging it seems jk-bms-bat02 is working now ... Not sure for how long ...

luckylinux commented 5 months ago

Now I seem to receive continuously messages from both. Very weird

syssi commented 5 months ago

Please try to replace the ESP32 if the setup remains unreliable.

luckylinux commented 5 months ago

Please try to replace the ESP32 if the setup remains unreliable.

Do you think it's a hardware issue?

I'm not sure it's maybe also triggered if I try to start the access point or Mqtt server.

Maybe I have to put the power supply for these Esp32 under a timer that automatically resets power ever few hours or so. Something like tasmota sonoff or similar.

If seems one Esp32 was stuck in safe mode.

By the way, Jk-bms-bat01 had also some trouble flashing at 460000 (?), so it tried again at 115200 (?) serial communication and that I believe succeeded.

I had used another desktop pc for that. Maybe I'll try reflashing with the old one, same as jk-bms-bat02.

syssi commented 5 months ago

Do you think it's a hardware issue?

Yes. There are unreliable ESP modules / dev boards out there.

I'm not sure it's maybe also triggered if I try to start the access point or Mqtt server.

The provided log above did look more serious as an connection issue.

If seems one Esp32 was stuck in safe mode.

The safe mode is triggered if the ESP reboots x times in a row.

By the way, Jk-bms-bat01 had also some trouble flashing at 460000 (?), so it tried again at 115200 (?) serial communication and that I believe succeeded.

This doesn't sound like a reliable dev board. ;-)

Please try to proceed as structured as possible to narrow down issues.

luckylinux commented 5 months ago

Well then I would habe to buy a few new boards 😔.

I think I had the D silicon, as the E silicon seemed to have some issues (related to CAN iirc, not an issue for this apicatiom though).

Should I buy some other boards and try & see with those? Or do you habe other suggestions before I do that?

syssi commented 5 months ago

Should I buy some other boards and try & see with those? Or do you habe other suggestions before I do that?

You could also wait some days. May be it's reliable now? In general this project / component doesn't need to be restarted periodically. My ESP wasn't touched for months now.

luckylinux commented 5 months ago

Should I buy some other boards and try & see with those? Or do you habe other suggestions before I do that?

You could also wait some days. May be it's reliable now? In general this project / component doesn't need to be restarted periodically. My ESP wasn't touched for months now.

RIGHT now I am continuously receiving data after I gathered the logs where jk-bms-bat01 was stuck at boot .... So BOTH jk-bms-bat01 and jk-bms-bat02 are periodically sending data right now.

What would it happen though:

syssi commented 5 months ago

If Access Point (Wifi) disconnects or reboots ? Does that trigger a restart (and after 5 times it locks into Safe Mode) ?

If there is no active wifi connection the wifi component will reboot if the reboot_timeout of 5 minutes is exceeded. The reboot counter should be cleared / resetted after a few seconds. A reboot because of WiFi loss cannot trigger the safe mode normally.

If MQQT / Mosquitto Server disconnects or reboots ? Does that trigger a restart (and after 5 times it locks into Safe Mode) ?

If mqtt is used there is another reboot_timeout option (see https://esphome.io/components/mqtt.html) which let's the device reboot after 5 minutes of no connection to to MQTT brokwer. Same applies here: This rare reboot shouldn't trigger the safe mode.

luckylinux commented 5 months ago

Well it kinda worked for a while.

Today it seems it's jk-bms-bat02 this time that stopped sending new data.

From the access point I cannot pint the device any longer. I can ping jk-bms-bat01 without problem though.

Do my configuration show any obvious issue that could explain this ?

Reading online I currently set:

There are also some reports that setting the log level to "INFO" helped resolv the issue. I currently have logging set to DEBUG.

When I try to ping jk-bms-bat01 (the one I can still access) I get quite varied and sometimes very long ping times

PING 172.22.110.164 (172.22.110.164) 56(84) bytes of data.
64 bytes from 172.22.110.164: icmp_seq=1 ttl=255 time=94.0 ms
64 bytes from 172.22.110.164: icmp_seq=2 ttl=255 time=112 ms
64 bytes from 172.22.110.164: icmp_seq=3 ttl=255 time=30.1 ms
64 bytes from 172.22.110.164: icmp_seq=4 ttl=255 time=52.0 ms
64 bytes from 172.22.110.164: icmp_seq=5 ttl=255 time=76.1 ms
64 bytes from 172.22.110.164: icmp_seq=6 ttl=255 time=203 ms
64 bytes from 172.22.110.164: icmp_seq=7 ttl=255 time=18.3 ms
64 bytes from 172.22.110.164: icmp_seq=8 ttl=255 time=41.1 ms
64 bytes from 172.22.110.164: icmp_seq=9 ttl=255 time=65.0 ms
64 bytes from 172.22.110.164: icmp_seq=10 ttl=255 time=90.4 ms
64 bytes from 172.22.110.164: icmp_seq=11 ttl=255 time=4.59 ms
64 bytes from 172.22.110.164: icmp_seq=12 ttl=255 time=35.9 ms
64 bytes from 172.22.110.164: icmp_seq=13 ttl=255 time=54.9 ms
64 bytes from 172.22.110.164: icmp_seq=14 ttl=255 time=78.1 ms
^C
--- 172.22.110.164 ping statistics ---
14 packets transmitted, 14 received, 0% packet loss, time 13017ms
rtt min/avg/max/mdev = 4.589/68.262/203.420/47.589 ms
syssi commented 5 months ago

I'm using the debug level INFO:

logger:
  level: INFO

and the RTT is pretty constant:

$ ping -c10 attic-bms-ble.local
PING attic-bms-ble.local (192.168.1.32) 56(84) bytes of data.
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=1 ttl=255 time=18.2 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=2 ttl=255 time=13.4 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=3 ttl=255 time=13.4 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=4 ttl=255 time=14.5 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=5 ttl=255 time=15.2 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=6 ttl=255 time=15.6 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=7 ttl=255 time=11.6 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=8 ttl=255 time=16.1 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=9 ttl=255 time=14.3 ms
64 bytes from 192.168.1.32: icmp_seq=10 ttl=255 time=15.0 ms

--- attic-bms-ble.local ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 45161ms
rtt min/avg/max/mdev = 11.586/14.738/18.189/1.687 ms
luckylinux commented 4 months ago

I put the ESP32 directly ON TOP of the BMS. That seemed to work for a few weeks.

Out of the blue, this morning, the ESPHome Reported "Offline" to MQTT (picked up by Home Assistant for Data Logging Purposes).

I'm still not sure whether this is just another EMC/EMI issue (causing the bluetooth connection to be interrupted) or the BMS shut down all of its functions while still allowing charge/discharge.

This is where I tried to ask for help assuming it is a hardware issue: https://diysolarforum.com/threads/urgent-jk-bms-went-offline-in-the-middle-of-the-night-soc-stuck-but-still-discharging.79532/

The issue occurred on Bat02 (HW v10) and NOT on Bat01 (HW v11) - Knock on wood ...

jogybaer0815 commented 4 months ago

Bluetooth and WiFi on the esp32 often seem to get in the way. I therefore reduced the WiFi performance and run 5 bms and 5 esp32 here and I have a better result: wifi:' ... '# output_power (Optional, string): The amount of TX power for the WiFi interface from 8.5dB to 20.5dB. Default for ESP8266 is 20dB, 20.5dB might cause unexpected restarts.' output_power: 8.5 Please test

luckylinux commented 4 months ago

Bluetooth and WiFi on the esp32 often seem to get in the way. I therefore reduced the WiFi performance and run 5 bms and 5 esp32 here and I have a better result: wifi:' ... '# output_power (Optional, string): The amount of TX power for the WiFi interface from 8.5dB to 20.5dB. Default for ESP8266 is 20dB, 20.5dB might cause unexpected restarts.' output_power: 8.5 Please test

Right now (knock on wood) it has ran OK for a week or two DIRECTLY ON TOP of the BMS.

Can you please share the exact config and value for the output_power parameter ?