Closed theanch92 closed 10 months ago
I've encountered the same issue as described here, while working with the NAS-WR01W smart socket and using the configuration provided above.
Possibly related to #154.
I upgraded from 2023.7.0-dev + LT 1.1.0 to 2023.9.0-dev + 1.2.1 and is facing the same issue. Then I tried setting framework to 1.1.0 (still with 2023.9.0-dev) and it still doesn't work.
bk72xx:
board: cb2s
framework:
version: 1.1.0
I get the following error:
12:34:53][W][bl0942:058]: BL0942 invalid checksum! 0xA7 != 0x00
[12:34:53][W][bl0942:058]: BL0942 invalid checksum! 0xA7 != 0x00
[12:34:53][W][bl0942:058]: BL0942 invalid checksum! 0xA7 != 0x00
[12:34:53][W][bl0942:058]: BL0942 invalid checksum! 0xA7 != 0x00
I looked at esphome and there doesn't seem to be any changes to the bl0942 code lately
Do you have binaries of the 2023.7.0 build or can you revert to that version with 1.1.0 framework?
I restored the old LT esphome container, from a backup, to a test instance of Home Assistant and build a working binary. Since the file contains wifi credentials I took the liberty of sharing the URL to the file, directly with you on Discord.
I unfortunately don't have any BL0942 devices to test this on. If you flash the old binary, does it still work? And flashing the new one breaks it again?
Yes, it is reproducable. Just tried multiple flashes of the device
Flashing ESPHome 2023.9.0-dev + LT 1.1.0 results in "[15:45:19][W][bl0942:058]: BL0942 invalid checksum! 0xA7 != 0x00" Flashing ESPHome 2023.7.0-dev + LT 1.1.0 works Flashing ESPHome 2023.7.0-dev + LT 1.2.1 results in "[16:00:22][W][bl0942:043]: Junk on wire. Throwing away partial message" Flashing ESPHome 2023.7.0-dev + LT 1.1.0 again - still works Flashing ESPHome 2023.7.0-dev + LT 1.2.0 results in "[16:00:22][W][bl0942:043]: Junk on wire. Throwing away partial message" Flashing ESPHome 2023.9.0-dev + LT 1.1.0 results in "[15:45:19][W][bl0942:058]: BL0942 invalid checksum! 0xA7 != 0x00" Flashing ESPHome 2023.9.0-dev + LT 1.2.1 results in "[16:19:05][W][bl0942:043]: Junk on wire. Throwing away partial message" Flashing ESPHome 2023.7.0-dev + LT 1.1.0 again - still works
This is a timing problem there is a faillure to implement any sort of serial handshake here so I would expect this type of issue. The TX is sent as 55AA and then the code hard waits 1ms which times out .. shortly after that the RX occurs and we see a valid response 200+ ms later. The the code sends 55AA again and does the same thing. Why the added delay now is unknown but the result is a handshake fail none the less.
Funny part is the code does a uart flush on send so on the next try it's never going to see the previous rx data .. lol
Flushing only waits till the buffered TX data is sent. It doesn't affect received data.
Hmm. Old memories .. lol https://www.arduino.cc/reference/en/language/functions/communication/serial/flush/ (Prior to Arduino 1.0, this instead removed any buffered incoming serial data.)
There is something odd going on with the uart component, its waiting 100+ms and we are seeing a time out after the available irq event was triggered.
bool UARTComponent::check_read_timeout_(size_t len) {
if (this->available() >= int(len))
return true;
uint32_t start_time = millis();
while (this->available() < int(len)) {
if (millis() - start_time > 100) {
ESP_LOGE(TAG, "Reading from UART timed out at byte %u!", this->available());
return false;
}
yield();
}
return true;
}
[12:36:34][D][uart_debug:114]: >>> 58:AA [12:36:34][E][uart:015]: Reading from UART timed out at byte 0! [12:36:34][W][component:204]: Component bl0942.sensor took a long time for an operation (0.22 s). [12:36:34][W][component:205]: Components should block for at most 20-30ms. [12:36:34][D][uart_debug:114]: <<< 55:E2:11:00:DA:BE:1C:5C:1F:01:D4:FD:FF:4B:00:00:3E:41:00:01:01:00:93
23 char recieved @ 4800 Time = 230 bits / 4800 baud ≈ 0.0479167 seconds or 47ms
At the most it would not be more than 48ms to receive the data.
@kuba2k2 This seems wrong .. where is the pointer BUF defined?
cores/beken-72xx/arduino/libraries/Serial/Serial.cpp
27 | this->buf = &BUF;
28 | }
50 | uart_hw_set_change(port, &cfg);
51 | uart_rx_callback_set(port, callback, &BUF);
It's defined in SerialPrivate.h
:
#include <Arduino.h>
#include <sdk_private.h>
typedef struct {
RingBuffer buf;
} SerialData;
#define DATA ((SerialData *)data)
#define BUF (DATA->buf)
#define pBUF ((RingBuffer *)param)
It's just a shorthand to access the RingBuffer
instance of the SerialData
structure.
Can you try https://github.com/hn/ginlong-solis/commit/76d81ce4d0418884d14362b2f8980c9652d40ebd as a possible workaround? (mentioned in #154)
Yes thats likey what would cause it, the IRQ fired and it looks like is was not fully processed. Thanks for the ref.
I have tested the workaround and it does not resolve this issue. Based on the actual data coming back via debug from the BL0942 I do not see data loss. I see timing issues where the RX data is not processed by the loop call of read_array. Subsequently the data does get into the ring and is correctly retrieved once the polling actually continues via debug reporting.
Using library-freertos-port @ 2023.3.13 the uarts RX debug data is not observed at all however with library-freertos-port @ 2023.5.23 debug RX data is available after the blocking timing problem.
This problem is not a uart issue but it is certainly a timing problem, or actually the lack of time .. lol I enabled VERY_VERBOSE (VV) and I get an alarming output! Normally when VV is enabled you will see an interval of ms like this:
[13:27:20][VV][scheduler:225]: Running interval '' with interval=10000 last_execution=59487 (now=69492)
[13:27:21][VV][scheduler:225]: Running interval 'update' with interval=10000 last_execution=61019 (now=71019)
That is a 10s interval schedule. But with the latest LibreTiny release on bk72xx I see this:
[13:37:13][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=5800 (now=12388) [13:37:13][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=5800 (now=12396) [13:37:13][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=5800 (now=12406) [13:37:13][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=5800 (now=12414)
The interval is 0 and thats an issue with all processing!
Using library-freertos-port @ 2023.3.13 the uarts RX debug data is not observed at all however with library-freertos-port @ 2023.5.23 debug RX data is available after the blocking timing problem.
That's not possible since there has been no changes between these versions.
What exactly do you put in config to get that interval output? I can do some checking on my devices too. Maybe I'll figure it out then.
There is always change its just not invoked by us .. lol The interval is always set with some default on any PollingComponent in the python code.
e.g.
.extend(cv.polling_component_schema("60s"))
In the case of the BL0942 you can just define the following YAML on any bk72xx even without the chip.
bk72xx:
board: cb2s
framework:
version: latest
# Enable logging
logger:
level: VERY_VERBOSE
uart:
id: uart_bus
tx_pin: P11
rx_pin: P10
baud_rate: 4800
stop_bits: 1
debug:
direction: both
sensor:
- platform: bl0942
uart_id: uart_bus
voltage:
name: 'Voltage'
current:
name: 'Current'
power:
name: 'Power'
filters:
multiply: -1
energy:
name: 'Energy'
frequency:
name: "Frequency"
accuracy_decimals: 2
update_interval: 30s
Take note of the optional "update_interval: 30s" value.
There is always change its just not invoked by us .. lol
Actually, no. library-freertos-port
is LibreTiny's package.
Here you can see that the only changes between these two versions are related to Realtek AmebaZ2.
Likely nothing to do with the freertos port package, likely to be some other lib in platformio is doing it after switching them, that one was just in the list of suspects and you just ruled it out.
The interval issue only occurs with
bk72xx:
board: cb2s
framework:
version: dev
HARDWARE: BK7231N 120MHz, 256KB RAM, 1.03MB Flash
- framework-arduino-api @ 2022.8.24+sha.237b10a
- framework-beken-bdk @ 0.0.0+v2021.06.07.sha.6491b8c
- library-flashdb @ 1.2.0+sha.d5c892f
- library-freertos @ 9.0.0+sha.95cc959
- library-freertos-port @ 2023.5.23+sha.a917d93
- library-lwip @ 2.1.3-bdk+sha.4ee4d34
- library-printf @ 6.1.0+sha.28a79bd
- library-uf2ota @ 5.0.0+sha.f955412
PLATFORM VERSIONS:
- libretiny @ 1.2.1+sha.f387138
- ltchiptool @ 4.2.3
CUSTOM OPTIONS:
- fw_name = esphome
- fw_version = 2023.9.0-dev
Dependency Graph
|-- ESPAsyncWebServer-esphome @ 3.0.0+sha.9f822c0
|-- DNSServer @ 1.1.0
|-- ArduinoJson @ 6.18.5
I have some interesting results. I changed the code to read the uart using read() instead of the read_array() call and it works. It also runs significantly faster 0.04ms vs 0.23 ms This look likes maybe something in esphome is the root cause.
e.g.
void BL0942::readline_(int readch, uint8_t *buffer, int len) {
DataPacket data;
static int pos = 0;
if (readch >= 0) {
if (pos < len - 1) {
buffer[pos++] = readch;
buffer[pos] = 0;
} else {
pos = 0;
}
if (pos >= 23) {
if (buffer[0] == BL0942_PACKET_HEADER) {
memcpy(&data,buffer,23);
if (validate_checksum(&data)) {
received_package_(&data);
} else {
ESP_LOGW(TAG, "Junk on wire. Throwing away partial message");
while (read() >= 0);
}
}
pos = 0;
}
}
}
void BL0942::loop() {
const int max_line_length = 32;
static uint8_t buffer[max_line_length];
while (available()) {
this->readline_(read(), buffer, max_line_length);
}
}
22:28:56 | [D] | [sensor:093] | 'relay-x1-8 Voltage': Sending state 119.05438 V with 1 decimals of accuracy 22:28:56 | [D] | [sensor:093] | 'relay-x1-8 Current': Sending state 0.02113 A with 2 decimals of accuracy 22:28:56 | [D] | [sensor:093] | 'relay-x1-8 Power': Sending state 1.11074 W with 0 decimals of accuracy 22:28:56 | [D] | [sensor:093] | 'relay-x1-8 Energy': Sending state 0.00121 kWh with 0 decimals of accuracy 22:28:56 | [D] | [sensor:093] | 'relay-x1-8 Frequency': Sending state 59.87307 Hz with 2 decimals of accuracy
The read() call wraps with read_array(data, 1) so more that 1 byte triggers this issue
The interval update issue is separate from the uart RX problem and it only occurs with dev ATM. However it is not occuring on my windows dev platformio instance. It does occur from the latest docker based HA integration.
windows platformio has this:
PLATFORM: LibreTiny (1.0.2+sha.27393e4) > CB2S Wi-Fi Module
HARDWARE: BK7231N 120MHz, 256KB RAM, 1.03MB Flash
DEBUG: Current (openocd) External (custom, openocd)
PACKAGES:
- framework-arduino-api @ 2022.8.24+sha.237b10a
- framework-beken-bdk @ 0.0.0+v2021.06.07.sha.6491b8c
- library-flashdb @ 1.2.0+sha.d5c892f
- library-freertos @ 9.0.0+sha.95cc959
- library-freertos-port @ 2023.5.23+sha.a917d93
- library-lwip @ 2.1.3-bdk+sha.4ee4d34
- library-printf @ 6.1.0+sha.28a79bd
- library-uf2ota @ 5.0.0+sha.f955412
- toolchain-gccarmnoneeabi @ 1.100301.220327 (10.3.1)
PLATFORM VERSIONS:
- libretiny @ 1.2.1+sha.27393e4
- ltchiptool @ 4.2.3
CUSTOM OPTIONS:
- fw_name = esphome
- fw_version = 2023.9.0-dev
Can you check if delay works on the machine which has interval timing issue? For instance, add a delay to a button so that it toggles a relay a bit later. So you'll know if it actually waits that amount of time or skips it. There are some compiler tricks involved in delay() because the same function is in Beken BDK but doesn't take milliseconds as parameter, but CPU cycles. It's possible that your HA compiler doesn't understand this.
Tested that and it does a delay normally using delay(1000)
[14:51:11][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=27268) [14:51:11][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=27278) [14:51:11][VV][scheduler:225]: Running interval 'update' with interval=15000 last_execution=12272 (now=27278) [14:51:11][D][bl0942:035]: First delay of 1000ms using delay(1000) [14:51:12][D][bl0942:037]: Second delay of 1000ms using delay(1000)
[14:51:12][W][component:204]: Component bl0942.sensor took a long time for an operation (2.01 s). [14:51:12][W][component:205]: Components should block for at most 20-30ms. [14:51:12][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=29322) [14:51:12][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=29328) [14:51:12][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=29340) [14:51:12][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=29348) [14:51:12][VV][scheduler:225]: Running interval 'update' with interval=0 last_execution=6796 (now=29358) [14:51:12][V][sensor:043]: 'relay-x1-8 Voltage': Received new state 117.568436 [14:51:12][D][sensor:093]: 'relay-x1-8 Voltage': Sending state 117.56844 V with 1 decimals of accuracy
Could we be hitting the fact the FIFO is enabled? If so -1 would leave the data behind and then I'm not sure what it would do.
while ((ch = uart_read_byte(port)) != -1) {
From TUYA's current repo we see they handle RX FIFO and it's enabled in the header of our repo. uart_bk.c
#ifndef UART1_USE_FIFO_REC
#define UART1_USE_FIFO_REC 1
#endif
#ifndef UART2_USE_FIFO_REC
#define UART2_USE_FIFO_REC 1
#endif
while(REG_READ(fifo_status_reg) & FIFO_RD_READY)
{
UART_READ_BYTE(uport, val);
rx_count += kfifo_put(rx_ptr, (UINT8 *)&val, 1);
}
#ifndef UART1_USE_FIFO_REC
#define UART1_USE_FIFO_REC 0
#endif
#ifndef UART2_USE_FIFO_REC
#define UART2_USE_FIFO_REC 0
#endif
Initially appears to work fine, just need to confirm it.
Nope its not the FIFO setting.
After cloning the most current https://github.com/libretiny-eu/libretiny-esphome.git repo it resolves the uart issues. So the fix is in the upstream branch. When the docker image is updated it should clear the two issues I observed.
Confirmed both the update interval and the uart issues are resolved.
So, could you summarize: Which old versions work fine Which versions break UART Which new versions fix UART again
and most importantly - how to fix it locally? in addition.
My friend changed code components bl0942 https://github.com/Brokly/bl0942 We are still fixing it, but it is already working ESPHome 2023.9.0-dev + LibreTiny: Version: v1.2.1+sha.27393e4 The Beken chip is faster than the esp chip.
If you manage to fix it, consider submitting a PR to ESPHome upstream.
Though this issue also presents some timing problems within LT core.
My friend changed code components bl0942 https://github.com/Brokly/bl0942 We are still fixing it, but it is already working ESPHome 2023.9.0-dev + LibreTiny: Version: v1.2.1+sha.27393e4 The Beken chip is faster than the esp chip.
I do not believe that the issue is with the BL0942 code. The Beken specs deem it to be identical in performance to the esp32 including uart FIFO capability. A code change somewhere in 2023.9.0-dev has corrected read_array timeout loop behaviour. I suspect it may be related to platformio but I have not had time to dig in further. It works with platformio 6.1.10, I need to check whats running it the docker image for the current libretiny to test the platformio version is the root cause.
If you manage to fix it, consider submitting a PR to ESPHome upstream.
Everything is fine in esphome, there are no problems there because it works slowly
I do not believe that the issue is with the BL0942 code.
Code is not optimal
When I added support for the bl0942 to EspHome a year ago, and I only tested it with libretiny (libretuya at the time). The switch still works since (never updated it).
There are obvious problems in the existing bl0942 code. For example, data is requested in update(), and the response is read in loop(). esphome conditions do not allow long operations in loop(), but we see the simultaneous publication of several sensors from loop(). Also, the full response from bl0942 comes in 0.05 sec (1sec/4800bod 10bits 22bytes). The component code does not take this delay into account, the code does not check whether the package has been received completely and this is a problem. Thus, if loop() occurs less than 0.05 seconds after update(), data exchange is disrupted. Therefore, I believe that similar errors with this component are possible in the future for esphome itself.
I only tested it with libretiny (libretuya at the time). @dbuezas, not works ESPHome 2023.9.0-dev + LibreTiny: Version: v1.2.1+sha.27393e4
There are obvious problems in the existing bl0942 code. For example, data is requested in update(), and the response is read in loop(). esphome conditions do not allow long operations in loop(), but we see the simultaneous publication of several sensors from loop(). Also, the full response from bl0942 comes in 0.05 sec (1sec/4800bod 10bits 22bytes). The component code does not take this delay into account, the code does not check whether the package has been received completely and this is a problem. Thus, if loop() occurs less than 0.05 seconds after update(), data exchange is disrupted. Therefore, I believe that similar errors with this component are possible in the future for esphome itself.
Yes the code is not optimum however it is not the reason for this issue. I have the original code working with 2023.9.0-dev and v1.2.1. What changed in this env was that platformio got updated to 6.1.10. along with other ESPHome changes I have not investigated on yet.
The polling request comes in as a scheduled callback to the update() code section and this seems to be a common method based on the ESPHome documentation and it would be reasonable to see a 1s or greater interval on the polling so under normal conditions we would not have a problem processing a 47ms character transmission duration however this could still trigger the blocking time measurement and issue a warning when using read_array and slow baud rates because the hardware uart port will create an available=true within the entire receive activity window provided its queue is steady.
read_array is simply this arduino code for libretiny framework:
bool LibreTinyUARTComponent::read_array(uint8_t *data, size_t len) {
if (!this->check_read_timeout_(len))
return false;
this->serial_->readBytes(data, len);
#ifdef USE_UART_DEBUGGER
for (size_t i = 0; i < len; i++) {
this->debug_callback_.call(UART_DIRECTION_RX, data[i]);
}
#endif
return true;
}
loop() is overridden so it’s really a slower local Arduino main loop with ESPHome's control elements etc. so I concur there can be issues with code that does blocking activity.
Note sure what your reference to component code means, there is the component supervisor and then the bl0952 component.
The bl0942 tests for a packet CRC and this is its completion check. It does not however do a full handshake and I think that’s Ok for this device. The loop fires very quickly it’s in the milli seconds range so it most definitely hits the available() code continuously in less than 50ms so that not the reason for this issue. read_array will be busy for the duration of the receive and will not be called again since it has not returned to the if condition. Once read_array is called if there is any issue with the incoming serial data it will time out and that’s what we observe in the original failure as per this log information.
[12:36:34][D][uart_debug:114]: >>> 58:AA
[12:36:34][E][uart:015]: Reading from UART timed out at byte 0!
[12:36:34][W][component:204]: Component bl0942.sensor took a long time for an operation (0.22 s).
[12:36:34][W][component:205]: Components should block for at most 20-30ms.
[12:36:34][D][uart_debug:114]: <<< 55:E2:11:00:DA:BE:1C:5C:1F:01:D4:FD:FF:4B:00:00:3E:41:00:01:01:00:93
Take notice that the serial data did get processed by the debug call after the timeout wierdness, kinda like we missed the IRQ event and need to start over?
What I don't like about the loop() is:
if (!this->available()) {
return;
}
I think we need to check if its available then processing when the data has arrived and not some time after. Truly optimum code would act on a serial IRQ interrupt directly but this is Arduino so not so much access .. lol
I tested platformio 6.1.9 on windows and it works so it must be a combination of changes that differ from the last docker image release for libretiny. The last docker image runs platformio 6.1.9. Need to investigate if any compiler optimization differences could be the root cause as well.
My edits to the component consisted in the fact that I first check for all the data, and then process it. At the same time, the checksum and the header are checked during the data acquisition process. The sensors update, although performed from loop(), is stretched into several cycles. In addition, I made an auto-search of the hider in the data stream. The fixed component completely removes any problems and works with any processor. The differences in versions are caused not by the processing of receiving uart data, but by various delays between update() and the next loop(). It seems to me that it will not be possible to remove this mine without using unnecessary delays. Everything is the will of the compiler.
I installed 2023.9.0-dev from https://github.com/libretiny-eu/esphome-hass-addon with version latest and it didn't work. I installed 2023.7.0-dev from https://github.com/catalin2402/esphome-hass-addon with version 1.1.0 and it didn't work.
libretiny:
board: cb2s
framework:
version: 1.1.0
BL0942 reports power measurements on CB2S with version 1.1.0 using this specific tag of libretiy-esphome-hassio
https://github.com/wernerhp/esphome/blob/main/bneta-smart-plug/bneta.yaml
Could someone explain if the fix was released?
I can confirm that using custom BL0942 implementation from here: https://github.com/Brokly/bl0942 works.
I also forked this repository to allow using it directly from GitHub: https://github.com/sprokipchyn/bl0942.
Usage example:
external_components:
- source:
type: git
url: https://github.com/sprokipchyn/bl0942
components: [bl0942]
uart:
id: uart_bus
tx_pin: P11
rx_pin: P10
baud_rate: 4800
data_bits: 8
stop_bits: 1
sensor:
- platform: bl0942
uart_id: uart_bus
voltage:
name: "Voltage"
current:
name: "Current"
power:
name: "Power"
energy:
name: "Energy"
frequency:
name: "Frequency"
accuracy_decimals: 2
update_interval: 1s
Yes, I believe it was fixed. Are you still using the dedicated LibreTiny ESPHome container/add-on or the LibreTiny-ESPHome version specifically? If so, you should switch to the official ESPHome. LibreTiny version should be 1.4.0+
Hello, with version 1.2.0 of libretiny there is a bug that does not allow the correct reading of data from UART BL0942, while with version 1.1.0 the data are read correctly.
The device is Aubess power monitor switch.
This is the log on 1.2.0:
This is the log on 1.1.0:
Configuration: