john30 / ebusd-esp32

Firmware for ESP32-C3 allowing eBUS communication for ebusd (https://github.com/john30/ebusd)
https://adapter.ebusd.eu/v5
23 stars 1 forks source link

Communication errors #53

Open mintgroen opened 4 months ago

mintgroen commented 4 months ago

My installation:

Vaillant AroTherm+ 105/6, VWZ AI (only 1 temp sensor + ebus), Thermostat VRC 720f, VR912 (sensonet gateway). Ebusd is connecting to the ebus interface over WiFi, powered by an ipad USB plug, 5v connected to the header.

Since some week I noticed strange behaviour of the heatpump, stopping at regular intervals while there was no reason to stop (like defrosts or target room temp reached) and the starts were way before the energy integral reached 0.

wSkQIMCQ4oMZpbq91PHFdf7y VQjqEuDy6K4itklR9D8qm4ZT HG5hgpzfY0JW5VG6Ywgwb1y1

The p1 data also reflected what the flow temperature was showing:

jaihjW8OUBJx47mUNeW85aB5

I initially thought it was a buggy/still hanging automation or something that I removed after testing. The symptom could be stopped by changing the ebusd to read only or stop the ebusd. since I would like to work on some automations in the future, I tried to work around the (HA?) issue by removing the ebusd from HA and migrating the ebusd to a 2nd rasperry (running RaspberryOS) and changed th csvs to the JonesPD repo.

After running for some days I noticed that the heatpump stopped working completely so I checked the VWZAI, error F9998 was on the display:

PXL_20240316_182948039

I disconnected the ebus interface, restarted VWZ AI. But I needed to power down/up the Arotherm to get the system back online. Let the system run for a week, no comm errors occurred.

After some advice on Tweakers I tried switching the power supply. Tried another 2A USB plug and switched to the USB cable to power the interfase, after some hours:

Screenshot_20240325-063127

One thing that I noticed is that a relay in the VWZ AI is clicking, once every 10s, maybe 3-4 times and then again after about 10 minutes. (similar to the compressor stops above). That is strange because I don't have anything connected or configured at the 230v relay control? Could be 2 related or seperate (csv misconfig?) issues?

And again, disconnected the ebus interface and the system is again running without any comm errors or clicking relay.

Do you have a clue what to do to resolve (both) issue(s)?

JavanXD commented 4 months ago

PXL_20240323_152430217.jpg

With the latest firmware on my ebus adapter I experienced a similar issue, not sure if it's 100% related to your problem. I had to first connect the Ebus cable to the adapter before powering on the Ebusd adapter, if I did not do it in the order my heat pump showed this Communication error and all heat pump devices stopped working.

mintgroen commented 4 months ago

I'm digging into the logs (/var/log/ebusd.log) and noticed [bus errors] at the same time as the clicking in the VWZ AI happened.

Among the regular log entries messages, these things are logged:

2024-03-28 14:49:01.233 [bus error] poll vwzio TotalEnergyUsageImmersionHeater failed: ERR: read timeout 2024-03-28 14:49:07.376 [bus error] poll vwzio TotalRunningHours failed: ERR: read timeout 2024-03-28 14:49:13.198 [bus error] poll vwzio YieldTotal failed: ERR: read timeout 2024-03-28 14:48:31.223 [bus error] poll vwzio Status16 failed: ERR: read timeout 2024-03-28 14:49:01.233 [bus error] poll vwzio TotalEnergyUsageImmersionHeater failed: ERR: read timeout 2024-03-28 14:49:07.376 [bus error] poll vwzio TotalRunningHours failed: ERR: read timeout 2024-03-28 14:49:13.198 [bus error] poll vwzio YieldTotal failed: ERR: read timeout a301c0002d00000000 2024-03-28 14:50:01.253 [bus error] poll vwzio ImmersionHeaterStarts failed: ERR: read timeout 2024-03-28 14:50:07.449 [bus error] poll vwzio ImmersionHeaterTemp failed: ERR: read timeout 2024-03-28 14:50:13.218 [bus error] poll vwzio OutdoorTemp failed: ERR: read timeout<

I'm commenting out every entry in the csv files for everything my system does not have (like the immersionheater, zone 2 and 3) or I am not interested in. Running for an hour now and the [bus error] don't appear. Let's run it over night and see if the comm error reappears

mintgroen commented 4 months ago

Running over night still yields [bus errors]:

2024-03-29 00:04:43.181 [bus error] poll basv Hc1ActualFlowTempDesired failed: ERR: read timeout 2024-03-29 00:21:13.162 [bus error] poll basv OpMode failed: ERR: read timeout 2024-03-29 00:42:43.121 [bus error] poll hmu SupplyTempWeighted failed: ERR: SYN received 2024-03-29 01:05:43.150 [bus error] poll hmu BuildingPumpHours failed: ERR: read timeout 2024-03-29 01:07:25.132 [bus error] poll hmu Fan1Hours failed: ERR: SYN received 2024-03-29 01:42:13.264 [bus error] poll hmu YieldThisYear8 failed: ERR: read timeout 2024-03-29 01:52:25.188 [bus error] poll hmu RunStatsCompressorHours failed: ERR: read timeout 2024-03-29 01:54:25.231 [bus error] poll hmu TotalHours failed: ERR: SYN received 2024-03-29 01:54:25.298 [bus notice] arbitration won in invalid state ready 2024-03-29 01:56:25.120 [bus error] poll hmu YieldThisYear8 failed: ERR: SYN received 2024-03-29 02:00:25.176 [bus error] poll basv HwcTempDesired failed: ERR: read timeout 2024-03-29 02:02:25.184 [bus error] poll hmu BuildingCircuitPumpPower failed: ERR: SYN received 2024-03-29 02:04:25.178 [bus error] poll hmu Fan2Hours failed: ERR: read timeout 2024-03-29 02:06:25.329 [bus error] poll hmu RunStats4PortValveHours failed: ERR: read timeout 2024-03-29 02:08:25.247 [bus error] poll hmu TemperatureSwitchElectricHeater failed: ERR: read timeout 2024-03-29 02:12:19.222 [bus error] poll basv AdaptHeatCurve failed: ERR: read timeout 2024-03-29 02:12:25.262 [bus error] poll basv DisplayedOutsideTemp failed: ERR: read timeout 2024-03-29 02:20:25.397 [bus error] poll hmu RunDataEEVOutletTemp failed: ERR: read timeout 2024-03-29 02:33:19.201 [bus error] poll hmu HeatCurve failed: ERR: read timeout 2024-03-29 03:01:07.099 [bus error] poll hmu Fan1Starts failed: ERR: SYN received 2024-03-29 03:08:49.228 [bus error] poll vwzio SupplyTemp failed: ERR: read timeout 2024-03-29 04:03:19.112 [bus error] poll hmu YieldThisYear11 failed: ERR: read timeout 2024-03-29 04:13:19.171 [bus error] poll hmu MinFlowTemp failed: ERR: read timeout 2024-03-29 04:26:49.486 [bus error] poll hmu HcModeActive failed: ERR: read timeout 2024-03-29 04:54:01.237 [bus error] poll hmu CurrentYieldPower failed: ERR: read timeout 2024-03-29 05:21:13.131 [bus error] poll hmu BuildingCircuitPumpPower failed: ERR: SYN received 2024-03-29 05:41:43.467 [bus error] poll hmu YieldCooling failed: ERR: arbitration lost 2024-03-29 05:45:07.197 [bus error] poll vwzio TotalRunningHours failed: ERR: read timeout

Bus arbitration errors concern me a little bit.

john30 commented 3 months ago

hard to tell without knowing which firmware you're using actually. anyway, as you mention the other config repo this could be just a similar issue as this one: https://github.com/john30/ebusd/issues/1205 does the same occur without using local csvs?

mintgroen commented 3 months ago

Currently on Build: [20240317]

I saw the symptoms about a month ago and then switched to the JonesPD repo.

I've switched to --configpath=https://cfg.ebusd.eu and ran the ebusd again, bus errors still occur:

2024-03-30 11:12:32.805 [mqtt error] decode basv z1Timer.Saturday: ERR: invalid position 2024-03-30 11:12:34.191 [bus error] poll basv z1Timer.Sunday failed: ERR: read timeout

2024-03-30 11:56:27.313 [mqtt error] decode basv z3CoolingTimer.Wednesday: ERR: invalid position 2024-03-30 11:56:28.139 [bus error] poll basv z3DayTemp failed: ERR: read timeout

2024-03-30 11:53:16.150 [update error] unable to parse poll-read basv z2CoolingTimer.Friday from 3115b524050303010104 / 00: ERR: invalid position 2024-03-30 11:53:18.057 [mqtt error] decode basv z2CoolingTimer.Friday: ERR: invalid position 2024-03-30 11:53:18.717 [update notice] received read hmu State QQ=71: 0;57;192;0 2024-03-30 11:53:20.651 [update notice] received unknown MS cmd: f108b5160114 / 09000000604100000000 2024-03-30 11:53:21.626 [update notice] received read basv currenterror QQ=f1: -;-;-;-;- 2024-03-30 11:53:21.810 [update notice] received unknown MS cmd: f115b503020002 / 0affffffffffffffffffff 2024-03-30 11:53:21.999 [update notice] received read hmu currenterror QQ=f1: -;-;-;-;- 2024-03-30 11:53:22.136 [update notice] received read hmu State QQ=71: 0;57;192;0 2024-03-30 11:53:22.551 [bus error] poll basv z2CoolingTimer.Monday failed: ERR: arbitration lost 2024-03-30 11:53:23.302 [update notice] received read hmu Status01 QQ=10: 19.5;17.5;-;-;-;off

The errors occur less than before but that may also be the result of the VWZ AI csv file missing:

2024-03-30 10:20:26.549 [main error] unable to load scan config 76: no file from vaillant with prefix 76 found 2024-03-30 10:20:26.550 [main error] scan config 76: ERR: element not found

mintgroen commented 3 months ago

Running for a while noticed a bus error signal lost:

024-03-30 15:08:48.280 [mqtt error] decode basv SFMode: ERR: invalid position 2024-03-30 15:08:52.017 [bus error] signal lost 2024-03-30 15:08:52.325 [bus error] poll basv SolarYieldTotal failed: ERR: read timeout

john30 commented 3 months ago

a read timeout and some arbitration losses are nothing unusual for such a bus, the rest is just messages noting that the response does not match the definition, so someone would need to fix those.

is the signal loss permanent or does it come back shortly after that? if using wifi, switching to another transport (ethernet or usb/rpi) might help

mintgroen commented 3 months ago

Ok, those read timeouts are nothing to worry about then.

Updated to 20240330 yesterday, does the update have any effect on the symptoms:

avoid unnecessary wait when sending to eBUS after successful arbitration

The signal lost happened twice today:

024-03-31 06:16:10.719 [update notice] received unknown MS cmd: 1076b51009000000ffffff050000 / 0101 2024-03-31 06:16:13.046 [bus error] signal lost 2024-03-31 06:16:20.656 [bus error] device status: transport closed 2024-03-31 06:16:20.656 [bus notice] device invalid 2024-03-31 06:16:30.661 [bus error] device status: transport closed 2024-03-31 06:16:30.661 [bus notice] re-opened 192.168.0.23:9999 2024-03-31 06:16:30.661 [bus notice] device invalid 2024-03-31 06:16:35.696 [bus notice] device status: transport opened 2024-03-31 06:16:35.697 [bus notice] re-opened 192.168.0.23:9999 2024-03-31 06:16:35.703 [bus notice] device status: reset, supports info 2024-03-31 06:16:35.711 [bus notice] device status: extra info: firmware 1.1[431e].1[431e], jumpers 0x0b 2024-03-31 06:16:35.711 [bus notice] signal acquired 2024-03-31 06:16:35.919 [update notice] sent poll-read basv Hc2FlowTemp QQ=31: -

[update notice] received read hmu State QQ=71: 0;213;192;0 2024-03-31 18:34:04.051 [bus error] signal lost 2024-03-31 18:34:12.015 [bus error] device status: transport closed 2024-03-31 18:34:12.016 [bus notice] device invalid 2024-03-31 18:34:20.093 [bus notice] device status: transport opened 2024-03-31 18:34:20.093 [bus notice] re-opened 192.168.0.23:9999 2024-03-31 18:34:20.095 [bus notice] signal acquired 2024-03-31 18:34:20.099 [bus notice] device status: reset, supports info 2024-03-31 18:34:20.108 [bus notice] device status: extra info: firmware 1.1[431e].1[431e], jumpers 0x0b 2024-03-31 18:34:20.139 [bus error] arbitration start error 2024-03-31 18:34:20.277 [update error] unable to parse poll-read basv OpModeEffect from 3115b52406020000006900 / 00: ERR: invalid position 2024-03-31 18:34:20.287 [bus notice] arbitration won in invalid state ready 2024-03-31 18:34:21.654 [update notice] received read basv currenterror QQ=f1: -;-;-;-;-

Any advice on these events?

john30 commented 3 months ago

sounds a bit as if you had more than one ebusd instance running, please check that

john30 commented 3 months ago

you might want to try the newest version just released as there were tons of commits related to wifi in ESP-IDF again

JavanXD commented 2 months ago

With the latest ebusd Adapter firmware, the "Kommunikationsfehler" error message which stopped my heatpump from working, disappeared. I am running it 2 weeks without this error message. Before I had this error message caused by the connected ebusd adapter multiple times a day, which blocked my heatpump.

@mintgroen can you confirm this behavior too?

john30 commented 2 months ago

please check with the new version 20240505 just published if this is still the case

avanmourik commented 1 month ago

Last week I had the same issue for a few times, after ebusd had been running for 5 months on my Vaillant VWL 75/6, with VWZ AI and sensocomfort 720, As the heat pump was not running, I was not too worried to do something quickly. But to get rid of the error I switched off the power to VWZ and the heatpump outdoor unit and switched on again. This worked. On another occasion a few days later I just reset the VZWIO and that worked again. Now it is working again for about 3 days. But then looking at my own data collection (every 2 minutes), I do not see large interruptions in data. e.g both the propane evaporation temperature (in the outside unit) and the room temperature happily varied with the day night outside temperature fluctuations. I had expected to see large holes in data, but they were not there. So I am not accusing EBUSD (yet), but it is a bit scary. At least at one occasion I also hear the clicking. Further my EBUSD is connected hard wired to the VWZIO and via USB to my raspberry computer. A self written python program collects the relevant data every 2 minutes and writes them to a monthly csv, which I copy to my PC and process further in excel. In there i have not (yet) seen abnormalities.

avanmourik commented 1 month ago

This morning (12/6/2024) I had another F9998 failure of the heatpump. Looking for signs in the var log I found the following : 2024-06-12 08:47:11.352 [update notice] received read hmu currenterror QQ=71: 9998;-;-;-;- 2024-06-12 08:47:19.212 [update notice] received read hmu State QQ=71: 0;0;160;2 2024-06-12 08:47:21.178 [update notice] received read hmu Status01 QQ=10: 18.5;18.0;-;-;-;8 However there are still data received via EBUSD : 2024-06-12 08:49:03.329 [main info] read ctlv2 z1RoomTemp cached: 19.975 2024-06-12 08:49:03.363 [main info] read hmu BuildingCircuitFlow cached: 0 2024-06-12 08:49:03.386 [main info] read hmu AirInletTemp cached: 11.7 again received 2024-06-12 08:49:12.624 [update info] received MS cmd: 7108b503020001 / 0a0e27ffffffffffffffff 2024-06-12 08:49:12.624 [update notice] received read hmu currenterror QQ=71: 9998;-;-;-;-

I am wondering if this error might be connected to a certain batch of ebusd adapters. Additional: I am using version 23.2.23.2 I can t retrieve the firmware number , but it is of september 2023.

Hope it helps tackling the problem

JavanXD commented 1 month ago
image

F.820 Verbindungsfehler: Pumpe Gebäudekreis

Ich hatte eine Zeit lang ganz viele Kommunikationsfehler, das lag aber eindeutig feststellbar am esp32, nachdem ich dort das WLAN sowie den Access-Point deaktiviert habe und nur noch über Ethernet gegangen bin, waren diese Fehler weg. Ich habe im gleichen Zuge auch die nicht benötigte LED Signalisierung deaktiviert, also praktisch alles was auf dem esp32 asynchron irgendwie läuft, damit der Kommunikationsprozess nicht beeinträchtigt wird.

Gestern kam jedoch zufällig der Error "F.820 Verbindungsfehler Pumpe Gebäudekreis" zurück. Dieser könnte aber auch auf einen Bug in der Vaillant Platine zurückzuführen sein, wie er hier beschrieben wurde.

avanmourik commented 1 month ago

Yesterday i upgraded the firmware of ebusd to the latest version (1451a) as advised by John; however, a few hours after that the F9998 error was back. As I still needed the heating pump for heating ; I disconnected ebusd completely from the system. This morning after reconnecting (Start PI with EBUSD connected), the relay in VWZ was switching on and off my CV-circuit heating pump irregularly, for no reason! Disconnecting ebusd stopped the pump from switching. Relay stopped flattering as well. reconnecting I a few hours later and then after startup connecting EBUSD, seems OK for now As far as I know, the only thing i changed before the problem started was to try and integrate various LiveMonitor(Main) definitions in my csv files, without any success. So all Livemonitor entries were taken out of the CSV files and system is now running again. However upon running the shell script to read all the portvalues, the CV-circulation pump got activated (ticking of the relay) and it seemed to scan all the pump options, that are also visible on the digital display of the pump. During the portscan just before the end I also got another communication fault on the VZW, but this seems to be coincidental , as another portscan did not result in a F9998 error

Still I can not get correct entries for live monitor to read the exact status of the heat pump ( running, idle, deicing....) Any aide on that ?

john30 commented 1 month ago

the live monitoring stuff is known to be highly problematic. I had updates to those merged in to the csv webservice and as several users claimed instability, i've reverted it. so please don't use those and please also make sure you're not accidentially using any of these messages at all. I'm pretty sure they are for temporary use only and for permanent. other than that you might give the new firmware a try that I just released

avanmourik commented 1 month ago

Thanks. I still have the following Livemonitor sentences running , as I got continuous read errors on the original descriptions (CurrentYieldPower and CurrentConsumedPower, but will retry after I now got rid of all the other read errors; these have worked before): r,,State,,,,,07,energy,,UCH,,,,,,energy,,kWh,,onoff,,UCH,0=off;1=on,,,state,,UCH,0x01=ready;0x0b=error;0x09=heating;0x11=cooling;0x81=heating_water,,

Values from Live Monitor,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

r,,,,,,B51A,05,,,,,,,,,,,,,,,,,,,,,,,, w,,,,,,B51A,05,,,,,,,,,,,,,,,,,,,,,,,, w,,ReadLiveMonitor,Payload needed sent: FF321f=desired supply (D2C);FF3220=current supply (D2C);FF3224=current power consumption (UIN);FF3223=power generated (UIN);FF3225=Modulation (D1B);FF3226=Air intake (D2C),,,,,,m,HEX:3,,,,,,,,,,,,,,,,,,,,, r,,LiveMonitorPowerConsumption,UIN,,,,FF3224,,,IGN:3,,,,,,energy,10,kW,,,,,,,,,,,,, r,,LiveMonitorPowerGenerated,UIN,,,,FF3223,,,IGN:3,,,,,,energy,10,kW,,,,,,,,,,,,, r,,LiveMonitorCompressorModulation,D1B,,,,FF3225,,,IGN:3,,,,percentage,,D1B,,,,,,,,,,,,,,,

(from :https://github.com/john30/ebusd-configuration/pull/316/files).

So far no reading errors and no F9998 failure. Also looking at the hmu.State variable as alternative. It reports something like (0,0,160,0) For the third element I have found: 160 = day standby 128= night standby 161= running. other "states" to be found

avanmourik commented 1 month ago

not sure where the bold sentence came from and this is a comment sentence only (forgot to copy the #)

avanmourik commented 3 weeks ago

After a few weeks of fine running,, I again experienced a communication failure today, so I removed the last Livemonitor sentences that were active to get current power and current yield. (see above) These were kept in place before because the original definitions as per : https://github.com/john30/ebusd-configuration/blob/master/ebusd-2.1.x/en/vaillant/08.hmu.csv do not work because of errors in definition. Searching again on the internet I found that instead of using: r,,CurrentConsumedPower,,,,,24,,,D1B,10,kW,,,,,,,,,,,,,,,,,,, I need to use r,,CurrentConsumedPower,,,,,FF3224,,,D1B,10,kW,,,,,,,,,,,,,,,,,,, as per JonesPD hmu file :https://github.com/jonesPD/ebusd-configuration/blob/master/ebusd-2.1.x/en/vaillant/08.hmu.csv similar for CurrentYieldPower (23=FF3223) This at least gives me values again without errors. (VWL 75/6 plus heat pump, 2022) See how this works out in the next weeks

As such this is strange as the original definition worked without errors up to about april this year