mr-manuel / venus-os_dbus-serialbattery

Battery Monitor driver for serial battery in VenusOS GX systems
https://mr-manuel.github.io/venus-os_dbus-serialbattery_docs/
MIT License
39 stars 15 forks source link

Segmentation faults in python with v1.4.20240928 on JKBMS BLE #86

Open DenkBrettl opened 5 days ago

DenkBrettl commented 5 days ago

Describe the bug

When running v1.4.20240928, I regularly get segfaults in the python binary using a JKBMS via BLE.

I've downgraded again to v1.3.20240705 and this one has run fine since a day now. I know this is hard to debug, but I'm happy to help. If you know how to enable core dumps on this, it would be helpful, otherwise I can research it.

How to reproduce

Install latest release, wait for a certain time (few hours), segfaults regularly occur.

Expected behavior

No segfaults

Driver version of the currently installed driver

v1.4.20240928

Driver version of the last known working driver

v1.3.20240705

Venus OS device type

Raspberry Pi 4

Venus OS version

v3.42

BMS type

JKBMS (Heltec BMS)

Cell count

16

Battery count

2

Connection type

Bluetooth

Config file

[DEFAULT]

; If you want to add custom values/settings, then check the values/settings you want to change in "config.default.ini"
; and insert them below to persist future driver updates.

; Example (remove the semicolon ";" to uncomment and activate the value/setting):

LOGGING = INFO

; May need to run bluetoothctl; then in the cli "scan on" and "trust MAC" for new devices
; C8:47:8C:E2:D1:A9 MoarBatt01
; C8:47:8C:E2:DB:1F MoarBatt02
BLUETOOTH_BMS = Jkbms_Ble C8:47:8C:E2:D1:A9, Jkbms_Ble C8:47:8C:E2:DB:1F

MAX_BATTERY_CHARGE_CURRENT = 280.0
MAX_BATTERY_DISCHARGE_CURRENT = 280.0

; --------- Cell Voltages ---------
; Description:
;     Cell min/max voltages which are used for:
;         - Calculating the min/max battery voltage
;         - Trigger the SoC reset when SoC calculation is enabled
; Example:
;     16 cells * 3.45 V/cell = 55.2 V max charge voltage.
;     16 cells * 2.90 V/cell = 46.4 V min discharge voltage
MIN_CELL_VOLTAGE   = 2.850
; Max voltage (can seen as absorption voltage)
MAX_CELL_VOLTAGE   = 3.450
; Float voltage (can be seen as resting voltage)
FLOAT_CELL_VOLTAGE = 3.375

; --------- SOC reset voltage (needs to match BMS settings) ---------
; +++ This has nothing to do with "SOC calculation" in a section below +++
; This is one of the possibilities to reset the SoC to 100%, because of SoC drift.
; Description:
;     May be needed to reset the SoC to 100% once in a while for some BMS, because of SoC drift.
;     Some BMS may needed to reset the SoC to 100% once in a while, because of SoC drift. Some
;     devices, like JKBMS, will reset their internal SOC value, if they reach the upper voltage level.
;     Using this method, the charging voltage can be raised regularly to achieve that.
;     (other BMS like Daly need an active overwriting of the SOC parameter. This happens each time when
;     the charging mode changes from Bulk/Absorption to float (and the cells are equalised). They do
;     not need this feature here.)
;     Specify the cell voltage where the SoC should be reset to 100% by the BMS.
;       - JKBMS: SoC is reset to 100% if one cell reaches OVP (over voltage protection) voltage
;     As you have to adopt this value to your system, I recommend to start with
;     OVP voltage - 0.030 (see Example).
;       - Try to increase (add) by 0.005 in steps, if the system does not switch to float mode, even if
;         the target voltage SOC_RESET_VOLTAGE * CELL_COUNT is reached.
;       - Try to decrease (lower) by 0.005 in steps, if the system hits the OVP too fast, before all
;         cells could be balanced and the system goes into protection mode multiple times.
; Example:
;     If OVP is 3.650, then start with 3.620 and increase/decrease by 0.005
; Note:
;     The value has to be higher as the MAX_CELL_VOLTAGE
;     You also have to set CELL_VOLTAGES_WHILE_CHARGING accordingly, if you set CCCM_CV_ENABLE to true
;         else the charging current will be reduced to 0 before the target voltage is reached and the
;         battery will never switch to float
SOC_RESET_VOLTAGE = 3.570
; Specify after how many days the SOC reset voltage should be reached again
; The timer is reset when the SOC reset voltage is reached
; Leave empty if you don't want to use the SOC reset feature
; Example:
;     Value is set to 15
;     day 1: SOC reset reached once
;     day 16: SOC reset reached twice
;     day 31: SOC reset not reached since it's very cloudy
;     day 34: SOC reset reached since the sun came out
;     day 49: SOC reset reached again, since last time it took 3 days to reach SOC reset voltage
SOC_RESET_AFTER_DAYS = 2

; --------- Cell Voltage Current limitation (affecting CCL/DCL) ---------
; Description: Maximal charge / discharge current will be in-/decreased depending on min and max cell voltages
; Example:
;     18 cells * 3.55 V/cell = 63.9 V max charge voltage 
;     18 cells * 2.70 V/cell = 48.6 V min discharge voltage
;     But in reality not all cells reach the same voltage at the same time. The (dis)charge current
;     will be (in-/)decreased, if even ONE SINGLE BATTERY CELL reaches the limits

; Charge current control management referring to cell-voltage enable (True/False).
CCCM_CV_ENABLE = True     
; Discharge current control management referring to cell-voltage enable (True/False).
DCCM_CV_ENABLE = True

; Set steps to reduce battery current                                     
; The current will be changed linear between those steps if LINEAR_LIMITATION_ENABLE is set to True
CELL_VOLTAGES_WHILE_CHARGING      = 3.60, 3.50, 3.45, 3.30
MAX_CHARGE_CURRENT_CV_FRACTION    =    0, 0.05,  0.5,    1                                      

CELL_VOLTAGES_WHILE_DISCHARGING   = 2.70, 2.80, 2.90, 3.10                                            
MAX_DISCHARGE_CURRENT_CV_FRACTION =    0,  0.1,  0.5,    1

Relevant log output

'current' log of one of the batteries:

2024-10-07 18:48:55.139862500 
2024-10-07 18:48:57.045012500 INFO:SerialBattery:
2024-10-07 18:48:57.045307500 INFO:SerialBattery:Starting dbus-serialbattery
2024-10-07 18:48:57.046355500 INFO:SerialBattery:Venus OS v3.42
2024-10-07 18:48:57.046633500 INFO:SerialBattery:dbus-serialbattery v1.4.20240928
2024-10-07 18:49:13.314624500 INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 18:49:13.314917500 INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 18:49:17.324509500 INFO:SerialBattery:BAT: JKBMS 10.XG 16S (20231003)
2024-10-07 18:49:17.325302500 INFO:SerialBattery:Connection established to Jkbms_Ble
2024-10-07 18:49:17.812952500 INFO:SerialBattery:Found existing battery with DeviceInstance = 1
2024-10-07 18:49:17.955688500 INFO:SerialBattery:DeviceInstance = 1
2024-10-07 18:49:17.956666500 INFO:SerialBattery:PID file created successfully: /var/tmp/dbus-serialbattery_1.pid
2024-10-07 18:49:17.956972500 INFO:SerialBattery:Used DeviceInstances = ['1', '3']
2024-10-07 18:49:17.957280500 INFO:SerialBattery:com.victronenergy.battery.ble_c8478ce2db1f__0
2024-10-07 18:49:17.980172500 INFO:SerialBattery:publish config values = False
2024-10-07 18:49:17.992789500 INFO:SerialBattery:Battery JKBMS BLE connected to dbus from ble_c8478ce2db1f
2024-10-07 18:49:17.992794500 INFO:SerialBattery:========== Settings ==========
2024-10-07 18:49:17.992797500 INFO:SerialBattery:> Connection voltage: 53.0 V | Current: -13.1 A | SoC: 91%
2024-10-07 18:49:17.992801500 INFO:SerialBattery:> Cell count: 16 | Cells populated: 16
2024-10-07 18:49:17.992804500 INFO:SerialBattery:> LINEAR LIMITATION ENABLE: True
2024-10-07 18:49:17.992807500 INFO:SerialBattery:> MIN CELL VOLTAGE: 2.850 V | MAX CELL VOLTAGE: 3.450 V| FLOAT CELL VOLTAGE: 3.375 V
2024-10-07 18:49:17.992878500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 280.0 A | MAX BATTERY DISCHARGE CURRENT: 280.0 A
2024-10-07 18:49:17.992882500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 150.0 A | MAX BATTERY DISCHARGE CURRENT: 150.0 A (read from BMS)
2024-10-07 18:49:17.992886500 INFO:SerialBattery:> CVCM:     True
2024-10-07 18:49:17.992889500 INFO:SerialBattery:> CCCM CV:  True  | DCCM CV:  True
2024-10-07 18:49:17.992918500 INFO:SerialBattery:> CCCM T:   True  | DCCM T:   True
2024-10-07 18:49:17.992921500 INFO:SerialBattery:> CCCM SOC: False | DCCM SOC: False
2024-10-07 18:49:17.992924500 INFO:SerialBattery:> CHARGE FET: True | DISCHARGE FET: True | BALANCE FET: True
2024-10-07 18:49:17.992928500 INFO:SerialBattery:Serial Number/Unique Identifier: 2032809157
2024-10-07 18:55:40.719322500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -81
2024-10-07 18:55:40.719328500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -84
2024-10-07 18:59:50.874364500 ./run: line 12:  4199 Segmentation fault      python /opt/victronenergy/dbus-serialbattery/dbus-serialbattery.py Jkbms_Ble C8:47:8C:E2:DB:1F
2024-10-07 18:59:51.030523500 
2024-10-07 18:59:51.030625500 
2024-10-07 18:59:51.030791500 INFO:Preparing Bluetooth for connection to BMS
2024-10-07 18:59:51.030860500 INFO:Bluetooth details
2024-10-07 18:59:58.995073500 Attempting to disconnect from C8:47:8C:E2:DB:1F
2024-10-07 18:59:58.995077500 [CHG] Device C8:47:8C:E2:D1:A9 RSSI is nil
2024-10-07 18:59:58.995081500 [CHG] Controller E4:5F:01:06:4B:BD Discovering: no
2024-10-07 18:59:58.995084500 [CHG] Device C8:47:8C:E2:DB:1F ServicesResolved: no
2024-10-07 18:59:58.995088500 Successful disconnected
2024-10-07 19:00:04.130834500 Device C8:47:8C:E2:DB:1F (public)
2024-10-07 19:00:04.130839500   Alias: MoarBatt02
2024-10-07 19:00:04.130842500   Paired: no
2024-10-07 19:00:04.130844500   Trusted: yes
2024-10-07 19:00:04.130846500   Blocked: no
2024-10-07 19:00:04.130848500   Connected: no
2024-10-07 19:00:04.130851500   LegacyPairing: no
2024-10-07 19:00:04.130853500   UUID: Device Information        (0000180a-0000-1000-8000-00805f9b34fb)
2024-10-07 19:00:04.130856500   RSSI: -80
2024-10-07 19:00:04.133142500 
2024-10-07 19:00:06.012735500 INFO:SerialBattery:
2024-10-07 19:00:06.013011500 INFO:SerialBattery:Starting dbus-serialbattery
2024-10-07 19:00:06.014625500 INFO:SerialBattery:Venus OS v3.42
2024-10-07 19:00:06.014894500 INFO:SerialBattery:dbus-serialbattery v1.4.20240928
2024-10-07 19:00:22.260999500 INFO:SerialBattery:Init of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 19:00:22.261284500 INFO:SerialBattery:Test of Jkbms_Ble at C8:47:8C:E2:DB:1F
2024-10-07 19:00:26.268169500 INFO:SerialBattery:BAT: JKBMS 10.XG 16S (20231003)
2024-10-07 19:00:26.268858500 INFO:SerialBattery:Connection established to Jkbms_Ble
2024-10-07 19:00:26.723916500 INFO:SerialBattery:Found existing battery with DeviceInstance = 1
2024-10-07 19:00:26.853614500 INFO:SerialBattery:DeviceInstance = 1
2024-10-07 19:00:26.854515500 INFO:SerialBattery:PID file created successfully: /var/tmp/dbus-serialbattery_1.pid
2024-10-07 19:00:26.854800500 INFO:SerialBattery:Used DeviceInstances = ['1', '3']
2024-10-07 19:00:26.855096500 INFO:SerialBattery:com.victronenergy.battery.ble_c8478ce2db1f__0
2024-10-07 19:00:26.874699500 INFO:SerialBattery:publish config values = False
2024-10-07 19:00:26.887890500 INFO:SerialBattery:Battery JKBMS BLE connected to dbus from ble_c8478ce2db1f
2024-10-07 19:00:26.887896500 INFO:SerialBattery:========== Settings ==========
2024-10-07 19:00:26.887899500 INFO:SerialBattery:> Connection voltage: 52.83 V | Current: -30.1 A | SoC: 90%
2024-10-07 19:00:26.887903500 INFO:SerialBattery:> Cell count: 16 | Cells populated: 16
2024-10-07 19:00:26.887906500 INFO:SerialBattery:> LINEAR LIMITATION ENABLE: True
2024-10-07 19:00:26.887909500 INFO:SerialBattery:> MIN CELL VOLTAGE: 2.850 V | MAX CELL VOLTAGE: 3.450 V| FLOAT CELL VOLTAGE: 3.375 V
2024-10-07 19:00:26.887982500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 280.0 A | MAX BATTERY DISCHARGE CURRENT: 280.0 A
2024-10-07 19:00:26.887987500 INFO:SerialBattery:> MAX BATTERY CHARGE CURRENT: 150.0 A | MAX BATTERY DISCHARGE CURRENT: 150.0 A (read from BMS)
2024-10-07 19:00:26.887991500 INFO:SerialBattery:> CVCM:     True
2024-10-07 19:00:26.887994500 INFO:SerialBattery:> CCCM CV:  True  | DCCM CV:  True
2024-10-07 19:00:26.888023500 INFO:SerialBattery:> CCCM T:   True  | DCCM T:   True
2024-10-07 19:00:26.888026500 INFO:SerialBattery:> CCCM SOC: False | DCCM SOC: False
2024-10-07 19:00:26.888029500 INFO:SerialBattery:> CHARGE FET: True | DISCHARGE FET: True | BALANCE FET: True
2024-10-07 19:00:26.888189500 INFO:SerialBattery:Serial Number/Unique Identifier: 2032809157
2024-10-07 19:02:51.979184500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -80
2024-10-07 19:02:51.979190500 [CHG] Device C8:47:8C:E2:DB:1F RSSI: -75

dmesg -T capture of one occurrence:
[Mon Oct  7 18:59:49 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:59:49 2024] CPU: 1 PID: 4199 Comm: python Tainted: G         C        5.10.110-rpi-venus-4 #1
[Mon Oct  7 18:59:49 2024] Hardware name: BCM2711
[Mon Oct  7 18:59:49 2024] PC is at 0xb6afd898
[Mon Oct  7 18:59:49 2024] LR is at 0xb6ae5698
[Mon Oct  7 18:59:49 2024] pc : [<b6afd898>]    lr : [<b6ae5698>]    psr: 60000010
[Mon Oct  7 18:59:49 2024] sp : beea33b0  ip : b6b1e2ac  fp : 00000000
[Mon Oct  7 18:59:49 2024] r10: 00000000  r9 : b69d8a0c  r8 : beea3474
[Mon Oct  7 18:59:49 2024] r7 : 01945848  r6 : 0192b3e0  r5 : b4102dd0  r4 : b412c248
[Mon Oct  7 18:59:49 2024] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000000
[Mon Oct  7 18:59:49 2024] Flags: nZCv  IRQs on  FIQs on  Mode USER_32  ISA ARM  Segment user
[Mon Oct  7 18:59:49 2024] Control: 30c5383d  Table: 08c88f00  DAC: 55555555
[Mon Oct  7 18:59:49 2024] CPU: 1 PID: 4199 Comm: python Tainted: G         C        5.10.110-rpi-venus-4 #1
[Mon Oct  7 18:59:49 2024] Hardware name: BCM2711
[Mon Oct  7 18:59:49 2024] Backtrace: 
[Mon Oct  7 18:59:49 2024] [<c0c181c0>] (dump_backtrace) from [<c0c18534>] (show_stack+0x20/0x24)
[Mon Oct  7 18:59:49 2024]  r7:ffffffff r6:00000000 r5:60000113 r4:c14e9918
[Mon Oct  7 18:59:49 2024] [<c0c18514>] (show_stack) from [<c0c1ca0c>] (dump_stack+0xc8/0xf4)
[Mon Oct  7 18:59:49 2024] [<c0c1c944>] (dump_stack) from [<c0208b84>] (show_regs+0x1c/0x20)
[Mon Oct  7 18:59:49 2024]  r10:fffffe30 r9:c481ff58 r8:00000f00 r7:0000000a r6:c457e3c0 r5:c481e000
[Mon Oct  7 18:59:49 2024]  r4:c481ffb0 r3:10b4f0f3
[Mon Oct  7 18:59:49 2024] [<c0208b68>] (show_regs) from [<c023bd64>] (get_signal+0x9fc/0xa00)
[Mon Oct  7 18:59:49 2024] [<c023b368>] (get_signal) from [<c020b6d4>] (do_work_pending+0x15c/0x568)
[Mon Oct  7 18:59:49 2024]  r10:fffffe30 r9:5ac3c35a r8:c481ffb0 r7:00000001 r6:c481ff44 r5:c481ff28
[Mon Oct  7 18:59:49 2024]  r4:ffffe000
[Mon Oct  7 18:59:49 2024] [<c020b578>] (do_work_pending) from [<c0200074>] (slow_work_pending+0xc/0x20)
[Mon Oct  7 18:59:49 2024] Exception stack(0xc481ffb0 to 0xc481fff8)
[Mon Oct  7 18:59:49 2024] ffa0:                                     00000000 00000000 00000000 00000000
[Mon Oct  7 18:59:49 2024] ffc0: b412c248 b4102dd0 0192b3e0 01945848 beea3474 b69d8a0c 00000000 00000000
[Mon Oct  7 18:59:49 2024] ffe0: b6b1e2ac beea33b0 b6ae5698 b6afd898 60000010 ffffffff
[Mon Oct  7 18:59:49 2024]  r10:00000000 r9:c481e000 r8:00000000 r7:30c5383d r6:ffffffff r5:60000010
[Mon Oct  7 18:59:49 2024]  r4:b6afd898

Amount of times this is happening:
[Mon Oct  7 16:33:52 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 16:44:40 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 17:21:26 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 17:44:54 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:43:24 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:48:46 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 18:59:49 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 19:03:50 2024] potentially unexpected fatal signal 11.
[Mon Oct  7 19:34:44 2024] potentially unexpected fatal signal 11.

Any other information that may be helpful

No response

DenkBrettl commented 3 days ago

Ok, while the BLE setup has been running for a while and was mostly stable, it has always had some moments where it would just hang. I've been thinking about a watchdog process that would monitor that there are new values from the BMS within a certain time frame. Let me know if you think this is a good idea and I might find some time in the next weeks to implement that.

Due to the instability, I'll switch back to serial connection.

Honusnap commented 2 days ago

I had the same thing, gone back to 1.32

mr-manuel commented 2 days ago

Unfortunately I troubleshooted this over 100 hours and found no real solution to all this Bluetooth problems. Therefore I decided to not put any other effort into the Bluetooth part. Another reason is, that the users apparently do not appreciate the work I do and they cannot immagine how much time consuming this all is.

For that 1-2 donations, if at all, a month on over 9.000 dbus-serialbattery installations it is not worth it. If everyone would donate 1 €/year then the motivation would be another, but that is not the case.

Feel free to open a PR :-)

Honusnap commented 2 days ago

Unfortunately I troubleshooted this over 100 hours and found no real solution to all this Bluetooth problems. Therefore I decided to not put any other effort into the Bluetooth part. Another reason is, that the users apparently do not appreciate the work I do and they cannot immagine how much time consuming this all is.

For that 1-2 donations, if at all, a month on over 9.000 dbus-serialbattery installations it is not worth it. If everyone would donate 1 €/year then the motivation would be another, but that is not the case.

Feel free to open a PR :-)

Maybe do not participate in a free project if your motivation is money ? We are using a tool you work on in exchange we are helping you making it better by beta testing it, we are also using our time to improve this tool. Maybe your are using too much of your time on this project.... do something else in parallel ? It work in 1.32... it fail in 1.42 .. obviously the "segmentation fault" bug comes from those modifications.

mr-manuel commented 1 day ago

My motivation is not money, but a lot of people think they can have something like premium support for free or paying once 5 bucks. Then they expect that I have to help them spending hours to analyze their problem, which is not fair.

Obviously you never contributed by coding in such a project. Testing is a lot easier and less time consuming.

Anyway, that was only a statement from me, why I will not develop this part further. On 9000 users there should not only be one person to contribute code ;-)

Honusnap commented 19 hours ago

My motivation is not money, but a lot of people think they can have something like premium support for free or paying once 5 bucks. Then they expect that I have to help them spending hours to analyze their problem, which is not fair.

Obviously you never contributed by coding in such a project. Testing is a lot easier and less time consuming.

Anyway, that was only a statement from me, why I will not develop this part further. On 9000 users there should not only be one person to contribute code ;-)

In the case of this thread.. it's not a user asking for premium service, it's a user that detected a bug and took his time to report it. I also had this bug and did not reported it ... cause i did report bugs in the past and the experience was not fun. But yes .. got the exact same bugDenkBrett, Line 12 segmentation fault... and i suppose many people have it. About people that want a five start support, just send them to the discussion forum and focus on what you like about this project.

About reducing user questions or interactions, cause i know it can be annoying, i suggest keeping beta in the beta state for longer.... cause average people do not install beta, that reduce the number of people that could react to a potential problem. When a beta is stable on 500 installs (or whatever is a good number in this case) for a month (meaning all bugs sovled)... time to get it to stable and expose it to more people. But i suppose you know that.

DenkBrettl commented 16 hours ago

Just to give my perspective: I didn't mean to ask for support. I wanted to achieve two things: 1) raise awareness for the issue and gauge whether others were affected too 2) start looking into debugging this problem.

I understand things can be sometimes frustrating and I appreciate your openness about this. I'm sorry if this bug report made you feel like you should have given support, it was not the intention.

Looking forward, I think it would be good to mark the BLE support as unstable in the meanwhile in order for people to know what they're getting themselves into. I'm happy to send a PR for a docs update, if you think this would be a sensible thing to do. I'm also still interested in your opinion about the watchdog functionality that I proposed above.

mr-manuel commented 7 hours ago

No problem, all good. That with the premium support was not related to this issue.

The BLE and CAN driver are already marked as not stable in the docs on the install page...