Open luckylinux opened 5 months ago
Please attach one of the ESPs via USB to your PC and take a look at the local log book using:
esphome logs esp32-ble.yaml
Do you see any warnings? Please provide these logs. In best case one per ESP. All different root causes should be visible here:
Here.
bat02_log_20240115.log bat01_log_20240115.log
It seems jk-bms-bat01 never really connects to wifi though.
After unplugging & replugging it seems jk-bms-bat02 is working now ... Not sure for how long ...
Now I seem to receive continuously messages from both. Very weird
Please try to replace the ESP32 if the setup remains unreliable.
Please try to replace the ESP32 if the setup remains unreliable.
Do you think it's a hardware issue?
I'm not sure it's maybe also triggered if I try to start the access point or Mqtt server.
Maybe I have to put the power supply for these Esp32 under a timer that automatically resets power ever few hours or so. Something like tasmota sonoff or similar.
If seems one Esp32 was stuck in safe mode.
By the way, Jk-bms-bat01 had also some trouble flashing at 460000 (?), so it tried again at 115200 (?) serial communication and that I believe succeeded.
I had used another desktop pc for that. Maybe I'll try reflashing with the old one, same as jk-bms-bat02.
Do you think it's a hardware issue?
Yes. There are unreliable ESP modules / dev boards out there.
I'm not sure it's maybe also triggered if I try to start the access point or Mqtt server.
The provided log above did look more serious as an connection issue.
If seems one Esp32 was stuck in safe mode.
The safe mode is triggered if the ESP reboots x times in a row.
By the way, Jk-bms-bat01 had also some trouble flashing at 460000 (?), so it tried again at 115200 (?) serial communication and that I believe succeeded.
This doesn't sound like a reliable dev board. ;-)
Please try to proceed as structured as possible to narrow down issues.
Well then I would habe to buy a few new boards 😔.
I think I had the D silicon, as the E silicon seemed to have some issues (related to CAN iirc, not an issue for this apicatiom though).
Should I buy some other boards and try & see with those? Or do you habe other suggestions before I do that?
Should I buy some other boards and try & see with those? Or do you habe other suggestions before I do that?
You could also wait some days. May be it's reliable now? In general this project / component doesn't need to be restarted periodically. My ESP wasn't touched for months now.
Should I buy some other boards and try & see with those? Or do you habe other suggestions before I do that?
You could also wait some days. May be it's reliable now? In general this project / component doesn't need to be restarted periodically. My ESP wasn't touched for months now.
RIGHT now I am continuously receiving data after I gathered the logs where jk-bms-bat01 was stuck at boot .... So BOTH jk-bms-bat01 and jk-bms-bat02 are periodically sending data right now.
What would it happen though:
If Access Point (Wifi) disconnects or reboots ? Does that trigger a restart (and after 5 times it locks into Safe Mode) ?
If there is no active wifi connection the wifi
component will reboot if the reboot_timeout
of 5 minutes is exceeded. The reboot counter should be cleared / resetted after a few seconds. A reboot because of WiFi loss cannot trigger the safe mode normally.
If MQQT / Mosquitto Server disconnects or reboots ? Does that trigger a restart (and after 5 times it locks into Safe Mode) ?
If mqtt
is used there is another reboot_timeout
option (see https://esphome.io/components/mqtt.html) which let's the device reboot after 5 minutes of no connection to to MQTT brokwer. Same applies here: This rare reboot shouldn't trigger the safe mode.
Well it kinda worked for a while.
Today it seems it's jk-bms-bat02 this time that stopped sending new data.
From the access point I cannot pint the device any longer. I can ping jk-bms-bat01 without problem though.
Do my configuration show any obvious issue that could explain this ?
Reading online I currently set:
There are also some reports that setting the log level to "INFO" helped resolv the issue. I currently have logging set to DEBUG.
When I try to ping jk-bms-bat01 (the one I can still access) I get quite varied and sometimes very long ping times
PING 172.22.110.164 (172.22.110.164) 56(84) bytes of data.
64 bytes from 172.22.110.164: icmp_seq=1 ttl=255 time=94.0 ms
64 bytes from 172.22.110.164: icmp_seq=2 ttl=255 time=112 ms
64 bytes from 172.22.110.164: icmp_seq=3 ttl=255 time=30.1 ms
64 bytes from 172.22.110.164: icmp_seq=4 ttl=255 time=52.0 ms
64 bytes from 172.22.110.164: icmp_seq=5 ttl=255 time=76.1 ms
64 bytes from 172.22.110.164: icmp_seq=6 ttl=255 time=203 ms
64 bytes from 172.22.110.164: icmp_seq=7 ttl=255 time=18.3 ms
64 bytes from 172.22.110.164: icmp_seq=8 ttl=255 time=41.1 ms
64 bytes from 172.22.110.164: icmp_seq=9 ttl=255 time=65.0 ms
64 bytes from 172.22.110.164: icmp_seq=10 ttl=255 time=90.4 ms
64 bytes from 172.22.110.164: icmp_seq=11 ttl=255 time=4.59 ms
64 bytes from 172.22.110.164: icmp_seq=12 ttl=255 time=35.9 ms
64 bytes from 172.22.110.164: icmp_seq=13 ttl=255 time=54.9 ms
64 bytes from 172.22.110.164: icmp_seq=14 ttl=255 time=78.1 ms
^C
--- 172.22.110.164 ping statistics ---
14 packets transmitted, 14 received, 0% packet loss, time 13017ms
rtt min/avg/max/mdev = 4.589/68.262/203.420/47.589 ms
I'm using the debug level INFO
:
logger:
level: INFO
and the RTT is pretty constant:
$ ping -c10 attic-bms-ble.local
PING attic-bms-ble.local (192.168.1.32) 56(84) bytes of data.
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=1 ttl=255 time=18.2 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=2 ttl=255 time=13.4 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=3 ttl=255 time=13.4 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=4 ttl=255 time=14.5 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=5 ttl=255 time=15.2 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=6 ttl=255 time=15.6 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=7 ttl=255 time=11.6 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=8 ttl=255 time=16.1 ms
64 bytes from 192.168.1.32 (192.168.1.32): icmp_seq=9 ttl=255 time=14.3 ms
64 bytes from 192.168.1.32: icmp_seq=10 ttl=255 time=15.0 ms
--- attic-bms-ble.local ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 45161ms
rtt min/avg/max/mdev = 11.586/14.738/18.189/1.687 ms
I put the ESP32 directly ON TOP of the BMS. That seemed to work for a few weeks.
Out of the blue, this morning, the ESPHome Reported "Offline" to MQTT (picked up by Home Assistant for Data Logging Purposes).
I'm still not sure whether this is just another EMC/EMI issue (causing the bluetooth connection to be interrupted) or the BMS shut down all of its functions while still allowing charge/discharge.
This is where I tried to ask for help assuming it is a hardware issue: https://diysolarforum.com/threads/urgent-jk-bms-went-offline-in-the-middle-of-the-night-soc-stuck-but-still-discharging.79532/
The issue occurred on Bat02 (HW v10) and NOT on Bat01 (HW v11) - Knock on wood ...
Bluetooth and WiFi on the esp32 often seem to get in the way.
I therefore reduced the WiFi performance and run 5 bms and 5 esp32 here and I have a better result:
wifi:' ... '# output_power (Optional, string): The amount of TX power for the WiFi interface from 8.5dB to 20.5dB. Default for ESP8266 is 20dB, 20.5dB might cause unexpected restarts.' output_power: 8.5
Please test
Bluetooth and WiFi on the esp32 often seem to get in the way. I therefore reduced the WiFi performance and run 5 bms and 5 esp32 here and I have a better result:
wifi:' ... '# output_power (Optional, string): The amount of TX power for the WiFi interface from 8.5dB to 20.5dB. Default for ESP8266 is 20dB, 20.5dB might cause unexpected restarts.' output_power: 8.5
Please test
Right now (knock on wood) it has ran OK for a week or two DIRECTLY ON TOP of the BMS.
Can you please share the exact config and value for the output_power
parameter ?
My goal was to put both BMS I have (JK BMS HW v11 & v10) for a 16s setup over MQTT using the esphome-jk-bms tool.
To recap:
I did that first for jk-bms-bat02. Everything worked. I get continuous MQTT messages every 10s or so.
This morning I flashed jk-bms-bat01 using the same config (minus the different hostname & MAC address of the BMS) and ... well ... both jk-bms-bat01 and jk-bms-bat02 sent ONE SINGLE SET of messages (all cell voltages, cell resistances, ...) AND THEN NOTHING.
When I programmed
jk-bms-bat01
this morning it seemed even finicky and the reboot loop afterwards threw yet another error message (cannot remember what exactly)It's been over an hour and they are not sending any more responses.
What could be the cause of this ?
I can oberse this:
mosquitto_sub -i MyTest -h 192.168.X.Y -p 1883 -t jk-bms-bat01/#
mosquitto_sub -i MyTest -h 192.168.X.Y -p 1883 -t jk-bms-bat02/#
Possible causes:
Both BMS are quite close together (~ 30 cm). Both ESP32 are quite close together (~ 20 cm).
Wifi Access Point is some Hi-Power Alfa Wireless Adapter running HostAPd on Linux. The reception should be very good.
My
esp32-ble.yaml
fileMy build script (to be run as
root
or possibly withsudo
):It's actually quite weird that it worked in the first place, because the access point is NOT supposed to have routing enabled between interfaces. And the subnet the Wifi Adapters should receive is 172.22.1.1/16 actually. Maybe
mqtt_host: 172.22.1.1
would work better.But why did it work in the first place ? And why it's not working with 2 adapters now ?