Closed deekoowee closed 1 year ago
I've not seen that behaviour so thanks for reporting it.
I'm concerned your inverter did stop discharging if it lost communication with the BMS?
Inverter did not stop discharging, when communication to BMS was lost
Hi, same problem occured again. No CAN messages to inverter. Inverter continued discharging. When restarting the controller, screen shows a note 'Failed to save settings' Firmware version is same: 4e72db94af375c391bad47b9b98507b3271b069d
Hi @deekoowee nobody else has reported an issue with CAN comms stopping - this could be either:
I've been running the CAN code for over 6 months now, most recently for 70+ days straight, with over 31 million CAN messages sent! I'm using a SOFAR ME3000SP inverter.
Hi @stuartpittaway I have continued the testing with another controller with new ESP32. I purchased Seeed Studio's USB-CANBUS logger and now I have canbus log for this issue if it helps.
Today, the new controller stopped sending the canbus messages. Attached the canbus log file, which shows that there's no messages from controller after 19:37:08:772. When CAN communication lost, There's no other messages than inverter ID 0x305 (can be seen at almost end of canbus log file).
To make log smaller, I have cut unnecessary lines from beginning of the log. CANBUS log 20230516_190333.txt
Time of inverter's log is about 00:01:43 ahead to canbus log. Time difference is my good guess, when comparing the point at canbus log, where inverter started to send ID 0x305.
Inverter's log have; 1) 19:26:40 The battery asked for a forced charge. #196 2) 19:38:53 Battery CAN communication lost.
Inverter send ID 0x305 at 19:37:10:371 and 19:37:10:425 Could two ID 0x305 with in 0,1 sec period cause the controller crash (Does the controller read 0x305) ?
Normally ID 0x305 are with ~10sec period except 19:34:41:888 to 19:37:10:371 and 19:37:10:371 to 19:37:47:171
Could there be a problem in data between 19:34:41:888 and 19:37:10:371 , causing the inverter to not sending ID 0x305 ? I mean, if data is making the inverter 'crazy'.
I have also the voltage curve from HA, which shows 'wild' at 19:25:30 (almost same time, when inverter's log have 'The battery asked for a forced charge.'
Because the inverter didn't get any messages from controller, it continued the battery discharging and start charging the battery after controller reset, when canbus started again.
Thanks for the detail @deekoowee. On the graph was the battery discharging at this point? I don't see an issue with the voltage graph - its just responding to loads on the battery - if you added current to the other axis, I suspect there are higher loads at that point as well.
DIYBMS ignores the 0x305 messages at the moment. Those should be sent by the inverter every few seconds, but shouldn't cause problems if quicker/slower.
Using the 19:26:40:106 as a time start point, using the CAN logs you supplied, this is the message that's sent from the controller to the inverter. At this point, you mention the inverter said it was being requested to "force charge"
7687 Receive 19:26:40:106 Data frame Standard frame 00000351 8 0c 02 40 06 c4 09 ce 01
7688 Receive 19:26:40:121 Data frame Standard frame 00000355 4 29 00 64 00
7689 Receive 19:26:40:137 Data frame Standard frame 00000356 6 24 13 8a ff 2c 01
7690 Receive 19:26:40:153 Data frame Standard frame 00000359 8 00 00 00 00 0a 50 4e 00
7691 Receive 19:26:40:175 Data frame Standard frame 0000035C 1 c0
7692 Receive 19:26:40:206 Data frame Standard frame 0000035E 8 50 59 4c 4f 4e 20 20 20
However, if we compare this message to an earlier one from 19:24 (when the inverter didn't report the force charge requested), nearly all the data is identical except for the first byte of 0x351 and 0x356
6808 Receive 19:24:01:573 Data frame Standard frame 00000351 8 13 02 40 06 c4 09 ce 01
6809 Receive 19:24:01:588 Data frame Standard frame 00000355 4 29 00 64 00
6810 Receive 19:24:01:620 Data frame Standard frame 00000356 6 26 13 47 ff 2c 01
6811 Receive 19:24:01:642 Data frame Standard frame 00000359 8 00 00 00 00 0a 50 4e 00
6812 Receive 19:24:01:657 Data frame Standard frame 0000035C 1 c0
6813 Receive 19:24:01:673 Data frame Standard frame 0000035E 8 50 59 4c 4f 4e 20 20 20
Lets look at those.... 0x351 – Battery voltage + current limits
6808 Receive 19:24:01:573 Data frame Standard frame 00000351 8 13 02 40 06 c4 09 ce 01
7687 Receive 19:26:40:106 Data frame Standard frame 00000351 8 0c 02 40 06 c4 09 ce 01
battery charge voltage = UINT16 - Little Endian (BA) 0x1302 = 53.1V 0x0c02 = 52.4V
40 06 = 160.0Amp charge max, c4 09 = 250.0Amp discharge max ce 01 = 46.20V min. voltage
Plotting all those on a graph produces this.
0x356 - Battery voltage - Voltage / Current / Temp
6810 Receive 19:24:01:620 Data frame Standard frame 00000356 6 26 13 47 ff 2c 01
7689 Receive 19:26:40:137 Data frame Standard frame 00000356 6 24 13 8a ff 2c 01
0x2613 =49.02V battery voltage 0x47ff = -185Amp current 0x2c01 = 30.0 temperature C
0x2413 = 49.00V 0x8aff = -118Amp current 0x2c01 = 30.0 temperature C
0x035c
6806 Receive 19:24:00:553 Data frame Standard frame 0000035C 1 c0
The only CANBUS command which can request a force a charge is "0x35c", which is a single byte. Every packet of data in the log file you recorded is 0xc0 = 11000000 in binary, this translates to charging is allowed & discharge is allowed. The controller doesn't actually have the code written to force a charge on the PylonTech emulation!
So in summary, the data the controller appears to send to the inverter is correct. The controller never asks for a "force charge" (it has no way to do so).
However, this doesn't answer why the controller stops CAN comms at "19:37:08:772" in the log file.
Unfortunately, for that, I'm going to need the serial console output from the ESP32 when this problem occurs.
Just to add, looking at the 0x305 messages received by the controller from the inverter,
10305 Receive 19:34:31:535 Data frame Standard frame 00000305 8 00 00 00 00 00 00 00 00
10360 Receive 19:34:41:888 Data frame Standard frame 00000305 8 00 00 00 00 00 00 00 00
11161 Receive 19:37:10:371 Data frame Standard frame 00000305 8 00 00 00 00 00 00 00 00
11162 Receive 19:37:10:425 Data frame Standard frame 00000305 8 00 00 00 00 00 00 00 00
11163 Receive 19:37:38:506 Data frame Standard frame 00000305 8 00 00 00 00 00 00 00 00
11164 Receive 19:37:47:171 Data frame Standard frame 00000305 8 00 00 00 00 00 00 00 00
It does look very suspicious that two messages are received at "19:37:10" in quick succession, and then no more CAN BUS data is sent from the controller. Hence the need for the serial console output - I suspect there is a software code bug.
Battery was discharging.
Here's the current graph from same period
I think, I can log the serial console output directly to file with Putty. I'll put it online together with canbus logger, then I have to wait. Is this log style ok ?
Hi @stuartpittaway Here's the serial log. I cutted it from the original log file. First error can be found in line 1289
Thanks for this @deekoowee. I've so far, not received any other reports of this bug.
I'm wondering if @atanisoft could shed any light on this CAN issue?
@stuartpittaway From the log file above:
E (175194290) diybms: Fail to queue CANBUS message (0x103)
The error code 0x103 is ESP_ERR_INVALID_STATE
which indicates the bus is not in a state where it can transmit. From a very quick review of the diyBMS code I don't see any error handling for bus-off conditions which can occur due to external factors and leaves the CAN controller in a state where TX will fail. You will need to add some checks similar to this (replace can -> twai and it should work). While the bus is in an off or recovery state TX is not possible and should be delayed. Once it recovers though you should be able to TX the packet(s) as normal.
@deekoowee I've followed @atanisoft code/advice.
Can I ask you to try out the next version of DIYBMS controller code?
If you go to this link... https://github.com/stuartpittaway/diyBMSv4ESP32/actions/runs/5119711405
download the file DIYBMS-Release-Artifact-2023-05-30-09-18
Hi, Yes, I can make testing for that version. Do you need canbus & serial logs ?
The serial logs would be useful to see what errors occur, and how the controller deals with them. Thanks
The controller has updated with next version code and logging is activated. Now just wait ...
Here's the serial log
Thank you @deekoowee
So the code I added detected that the CANBUS went into an error state (recovery needed). The code automatically attempted to recover (Initiating recovery
) however it didn't appear to work, and the CANBUS entered the STOPPED state.
@atanisoft is recovery normally possible? Once in STOPPED state, can I re-start ?
D (44203473) diybms: Sent CAN message 0x351
D (44203494) diybms: Sent CAN message 0x355
D (44203514) diybms: Sent CAN message 0x356
E (44203534) diybms: Fail to queue CANBUS message (0x103)
I (44203534) diybms: CAN STATUS: rx-q:0, tx-q:0, rx-err:0, tx-err:128, arb-lost:13, bus-err:44, state: OFF / RECOVERY NEEDED
W (44203539) diybms: Initiating recovery
E (44203562) diybms: Fail to queue CANBUS message (0x103)
I (44203562) diybms: CAN STATUS: rx-q:0, tx-q:0, rx-err:0, tx-err:0, arb-lost:13, bus-err:44, state: STOPPED
E (44203585) diybms: Fail to queue CANBUS message (0x103)
I (44203585) diybms: CAN STATUS: rx-q:0, tx-q:0, rx-err:0, tx-err:0, arb-lost:13, bus-err:44, state: STOPPED
E (44204589) diybms: Fail to queue CANBUS message (0x103)
I (44204589) diybms: CAN STATUS: rx-q:0, tx-q:0, rx-err:0, tx-err:0, arb-lost:13, bus-err:44, state: STOPPED
E (44204613) diybms: Fail to queue CANBUS message (0x103)
I (44204613) diybms: CAN STATUS: rx-q:0, tx-q:0, rx-err:0, tx-err:0, arb-lost:13, bus-err:44, state: STOPPED
E (44204636) diybms: Fail to queue CANBUS message (0x103)
I (44204636) diybms: CAN STATUS: rx-q:0, tx-q:0, rx-err:0, tx-err:0, arb-lost:13, bus-err:44, state: STOPPED
E (44204659) diybms: Fail to queue CANBUS message (0x103)
is recovery normally possible? Once in STOPPED state, can I re-start ?
Yes, it should be possible. The call to twai_initiate_recovery
will start the recovery (as you have seen) and it seems instead of returning to a running state like the old CAN driver did it goes into a STOPPED state. You can add one more else if
clause for TWAI_STATE_STOPPED
and call twai_start
and then try sending the message.
You could also use the alerts mechanism to have it tell you when certain events occur so you can initiate recovery / restart / etc. I've shifted to the lower level TWAI HAL APIs and when TWAI_HAL_EVENT_BUS_RECOV_CPLT
is raised I immediately trigger the bus restart.
Controller shows this;
What does it mean ?
Controller shows this; What does it mean ?
This is a count of errors for RECEIVING from the inverter - the "heart beat" which the inverter sends the controller about every 10 seconds. This error counter is incremented when a message isn't received (such as when the CANBUS is stopped/in error).
@deekoowee thanks for trying out the various attempts to get this working for you. Heres the latest one.
https://github.com/stuartpittaway/diyBMSv4ESP32/suites/13292392686/artifacts/725613861
@atanisoft thanks for the help.
Latest one uploaded to controller. Logging on and just waiting...
Controller shows this;
4 errors and controller has passed them without a problem.
Great, so it looks like the recovery on the CANBUS works.
I still think there is a software fault in the inverter, but if this helps then great.
Hi @stuartpittaway Installed firmware 4e72db94af375c391bad47b9b98507b3271b069d The CAN messaging stopped; 1) Battery SoC was over 100% (~140%, while charging, 'Override State of Charge' was checked) 2) Discharging battery 3) When battery SoC reached 100%, 'CAN sent:' stopped counting. I catch it, because inverter complained missing battery. However discharging continued. I reset controller and 'CAN sent' counting started again and inverter was happy again.