Requested temp set from automation was not updating

johny-mnemonic commented 1 year ago

I had an issue with the HC Thermostat used from automation today.

I have an automation that triggers on the thermostat current temperature change and sets requested temperature = (current temp + 0,5°C). As you can see in the chart below the target temp was always set by the automation but immediately returned to the original value. Almost like if the integration thought the change was not accepted by the API. I have seen this happening today and according to the chart below it was between 10:38 and 14:32. Since then the requested temp started tracking the current temp again.

MT_HA_2

When it happened the requested temp was visible in the MT app as well as on the pAD.

It's strange that the requested temp was not updated from the API to HA after a while no matter whether the integration considered the update of the value successful or failed. Especially as I haven't found any other values to be stuck/not updated in that time window. According to HA history all other values from HP were changing values just fine.

I have checked history few days back and have found similar pattern on 12.2. between 18:05 and 20:00, so being this random it really looks like related to some API communication issues. If that's the case it would be probably better to log some warnings into HA log.

It is great that the integration can survive API errors or even outage without user intervention. Great job! There are much more mature integrations that are failing badly when this happens, so kudos for this. But it would be nice if this was not completely silent so that one would be able to correlate warnings in log with weird behavior.

This might be a follow-up to #61

Version of the custom_component

1.0.0

Configuration

Scan interval is set to 30 seconds. Old Mastertherm API.

Add your logs here.

There is not a single line of log related to the Mastertherm extension since last boot on 13.2. at 20:29

sHedC commented 1 year ago

In the masterthermconnect module the logic assumes that if we get a return with no data then it just ignores it but I found the other day that if the HP disconnects from the API then it also returns no data, but it does return an error code in the JSON.

I am updating to pick this up and log that the HP is disconnected (offline) and the idea is to change the Operating Mode to offline or disconnected in case of these errors and log for those I don't have logic for. Working on it in the python-masterthermconnect module in "Add more Operating Modes #75"

However the update should have given an error if the API didn't accept an update, there is no logic to stop those failures so something should have shown up on the 12.2 if failed :(

Did you capture any logs from the 12.2? or not until the restart? I checked my logs and on 12.2 history is still there changing.

Forgot to ask what time zone you are on. I see your history is still updating so maybe not disconnecting but failing to push an update. There should be no issue with overlapping updates I put in some logic to make sure that two requests don't happen at the same time, will try updating one of my thermostats and see what happens over time.

johny-mnemonic commented 1 year ago

@sHedC I am not sure if there is a way to find logs prior last boot. But since the last boot on 2023-02-13 20:29 there is not a single line from Mastertherm integration even though the time window where it had issues on 14.2. was much longer, so I wouldn't expect anything logged on 12.2. either. Should I enable debug log for the integration to have better chance of catching something interesting?

sHedC commented 1 year ago

@johny-mnemonic - its ok seen it, I was testing my pump being offline when the API was online so I could capture it, and it did only threw exceptions when the HASS tried to do an update but not on read.

Will fix it to flag as offline if the pump gets disconnected for the Operation Mode, but will keep working, so Operation mode can be:

unavailable API Problem
offline - pump disconnected from API
idle - doing nothing
other states.

Its in progress.

johny-mnemonic commented 1 year ago

Seems like I have another such "accident". Yesterday the automation turned off HC1 and when it tried to turn it on today it turned off itself after exactly 30s (my scan interval). When I found it now, I tried to turn it on manually, but it immediately turned off itself.

MT_HA_3

Nothing in the log and HC1 seems to be ON in the HP control unit and in the MT App...

As expected, setting the requested temp has the same issue as before: MT_HA_4

SeBsZ commented 1 year ago

Looks like rate limiting maybe?

May I ask why you are setting your requested temp 0.5c higher than the measured temp? Is this to always keep heating on? How do you then switch it off? Do you close valves? Does your heat pump get high pressure warnings or this can't happen for you?

johny-mnemonic commented 1 year ago

@sHedC I have interesting findings. I tried to turn on the HC1 again without success as it again turned off itself. So I got an idea to try to execute ON-OFF-ON in a rapid succession which worked and now the HC1 stays ON. Which leads me to believe the integration relies on it's own stored value instead of verifying the state from the API (as on the API it is ON since 8am today).

@SeBsZ I am "blocking" the HP by controlling the HDO signal by wire. Which was the only control I had for the HP prior this HA integration. Setting the room requested temperature is an attempt to see how the HP behaves when the difference between the room requested temp and current temp is high and when it is low, whether it controls the water requested temp somehow accordingly. I have the pAD on a place where afternoon sun can reach it, so if I wanted it to always demand heating, I would need to set it to 28°C+ statically. Hence I am trying to find if that would be good idea or not.

In general I would like to have more fine-grained control over the HP, than just sending HDO signal over wire to block the heating.

sHedC commented 1 year ago

@johny-mnemonic - did you check the logs?

The write to update sends the command to the API to update, if the API response success then it sets the internal register to whatever you put.

It checks for the update to stick based on two requirements 1) The server response with a success 2) The return value matches the value I sent, as the API returns the value set

But, HASS does its own thing. So sometimes it re-sets the value and sometimes it does not so 30 seconds later when your system refreshes you get the value.

So the crux. I have put a control in to stop the system sending multiple requests in quick succession 0.1 seconds between each call. The reason is without it the second request always failed. But I notice that the APP does it at a rate of 0.5 seconds per call.

I also thread lock to stop a refresh happening at the same time as a set or multiple sets occuring as also this fails.

I am refreshing at a rate of 30 seconds per call and have automations happening one to set the water min/ max temp and one to turn on/ off the Hot water request from about 7pm until 2am (after showers and in high peak tariff).

Ok so next part as not sure I understand.

Are you trying to turn off the heating but keep the DHW working, which yes would be great with the HDO?
How many requests are you making to the update at one time and how many times do you initiate the request. i.e. what is your automation looking like timing and request wise?
Do you have any exceptions in the logs, if the update fails you should see it there before a restart?
Do you think that maybe allowing a request every 0.1 seconds is causing the issue?

Here is what I have done for automations:

Turn off the Hot Water at 7pm, turn it back on at 2am. My Hot water is based on HC1 and I have the min/ max temp set to 48.
Set the HP main heating curve to 20-30 from 8.30am to 2am (peak time) set the house temp to 20 (separate system)
Set the HP main heating curve to 30-40 from 2am to 8.30am (off peak) set the house temp to 22 to over heat the floor

I notice that my energy usage during the day significantly reduced and the pumps come on a lot less frequently and the house stays at 20.

Also with my setup having HC1 and HC2 doing the hot water rather than DHW setting the Ambient temperature to anything other than 20 impacts the water requested temp. But in my setup these values should not be changed. If I set Ambient temp higher it increases the requested water temp and lower it descreases the requested water temp.

So other observations:

Whilst hot water is not being requested the HP keeps the water temperature around the requested temp based on the Heating Curve.
When HW is requested it increases to the HW requested temp

phew

sHedC commented 1 year ago

This is what happens in over 24 hours:

sHedC commented 1 year ago

Adding one more point I just remembered, requested temp does impact the requested water temp, bascially if your house is at 14 and then you want to heat up to 20, it causes the HP to increase the water temp higher until the ambient and requested match the closer it matches the more it goes back to the control curve setting.

johny-mnemonic commented 1 year ago

Sure I checked, but I haven't seen a single line of log from your integration. That's why I asked whether it wouldn't make sense to increase the logging level. Or is there some beta version with new logging?

Are you trying to turn off the heating but keep the DHW working, which yes would be great with the HDO?

Yep, exactly. I am trying to just control heating without affecting DHW. Just like I always did with the HDO. If this works well, I plan to reconfigure HDO to block DHW as well and use it to completely block HP from consuming electricity.

How many requests are you making to the update at one time and how many times do you initiate the request. i.e. what is your automation looking like timing and request wise?

I am doing only single request at a time. For requested temp it's multiple times per hour (triggered by current temp change) and for HC1 control usually few times a day.

Do you have any exceptions in the logs, if the update fails you should see it there before a restart?

I haven't seen any exception for any such "failed update".

Do you think that maybe allowing a request every 0.1 seconds is causing the issue?

I am not doing frequent updates so apart from two automations triggered simultaneously by chance it shouldn't happen that my system would want to do two updates at once. And serializing such two updates with even seconds apart from each other would be fine from my POV, so no need to stress the API.

My automations have different target than yours. I am trying to harness the most of the sun energy from my PV system and also use the fact that it is warmer during the day so HP is more effective. So I started ignoring the requests from rooms during the night (22-08) which translates to turning HC1 on at 08:00 and turning it of at 22:00. Also keeping the room requested temp close to the current room temp is aiming to lower the requested water temp as much as possible to increase the effectivity of the HP and lower it's power consumption. Checking the HP behavior it looks like messing with HC1 is redundant though as it reacts very well on the requested room temp. When I set it below current room temp it correctly stops heating (who would have thought 😄).

Adding one more point I just remembered, requested temp does impact the requested water temp, basically if your house is at 14 and then you want to heat up to 20, it causes the HP to increase the water temp higher until the ambient and requested match the closer it matches the more it goes back to the control curve setting.

I expected it would behave like you describe but so far it looks like the requested water temp strictly reacts only to the outside temp and ignores the difference between current and requested room temp. I tried only raising the difference to 5°C though, so maybe that's not enough to trigger the requested water temp increase...

I also thread lock to stop a refresh happening at the same time as a set or multiple sets occuring as also this fails.

I am not sure whether this couldn't be contributing to the issue when the API is slow to update. Imagine the set of the value is called to change HC1 from ON to OFF. API responds with success, accepting the requested change. But as the server most probably then communicates with HP and confirms HP accepted the change it might take some time before the updated value is actually visible on the API. If the integration sets the changed value in HA after successful call then the next regular API scan could find old value on the API and basically set it back to the original value.

Or does it behave the way that once the server accepts the value it is visible on the API no matter whether HP accepted it or not? Essentially I am asking where is the source of truth. Whether it's HP or the Mastertherm server. In case it's HP and MT server just tries to follow then we might need to change the procedure to following:

Sent change request to API
Check the response for success
Start polling the API to verify value changed
Update HA only after API value has changed

It would be very slow to update any value (or batch of values) but might work reliably.

sHedC commented 1 year ago

No Beta yet was just working on some updates to the connect module and also trying to work out what logging to add. I can put some Info logging on the set command to capture what I send and what the API is doing but if no logging is happening warning or error then the API is responding with a success.

Ok so Domestic Hot Water and PV. This is what the Solar settings are for, they are on every heat pump but usually disabled, there is:

Status - On or Off - currently I use this to disable the solar section but its whether the solar feature is enabled or not
Panel Temperature - The real temp of the solar panel
Tank 1 Temp - The water tank 1 temp
Tank 2 temp - if you have a second tank
Output 1/2 - This is No or Yes and is the Relay Output No is off and Yes is on

I don;t know exactly how it works but my installer said if you have PV it tells the HP when to turn off DHW function to make use of the PV better.

I suppose the DHW could also just be disabled/ enabled there is also an on command which is D_29 not sure what that does exactly, force on or off? this is on for me.

Is your HP an air to water? if so I get the making max use of the warm weather but not sure what the benefit of asking for more heat into the house helps with energy saving. I have ground source so warmer weather for short term makes no difference.

I will add some more logging to the set command to help but I suspect it might be the API failing not the code but who knows. I will check the logging. I will also slow down the commands a bit as I am not sure waiting 0.5 vs 0.1 makes much of a difference to any use case.

DHW should be totally unaffected by the HC1 being on or off I hope at least that is how I understand it.

sHedC commented 1 year ago

Might have something, in the set call, if the post returns the token is invalid (expired or they restarted something to make it invalid) I catch it then do a get data and return.

That is not correct should re-try a post to get a new token. I think because you do frequent updates you might be hitting that so no error just wrong functionality.

PS can you check this issue please, I have a question there. :) https://github.com/sHedC/python-masterthermconnect/issues/53

johny-mnemonic commented 1 year ago

Are you sure "Solar" settings are for anything related to PV? I would have thought they are for controlling when the solar panels should be used to heat the DHW boiler tank. I am afraid you are messing solar panels and photovoltaic panels which are completely different things ;-)

Seems like D_29 works for enabling/disabling of the DHW. At least it switches ON/OFF on the main control panel. Will test if it really disables heating of DHW. But in HA the domestic_hot_water_enabled stays active no matter the state of D_29, which is weird.

Yes, I have air/water HP and outside temperature is pretty big factor on it. It is clearly visible on the power consumption. So it makes sense preheating the floors (which are essentially large concrete blocks with water lines in them) during the day with free electricity from the sun and when the HP is most effective. Then disabling HP during nigh when outside temp plummets and with it the HP effectivity...

I have seen first errors in the log yesterday:

Logger: homeassistant.components.automation.mastertherm_ovladani_hc1
Source: custom_components/mastertherm/coordinator.py:216
Integration: Automatizace (documentation, issues)
First occurred: 17. února 2023 v 17:26:50 (3 occurrences)
Last logged: 17. února 2023 v 17:26:50

Mastertherm - Ovládání HC1: Choose at step 1: default: Error executing script. Unexpected error for call_service at pos 1: connection_error
Mastertherm - Ovládání HC1: Error executing script. Unexpected error for choose at pos 1: connection_error
While executing automation automation.mastertherm_ovladani_hc1
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/masterthermconnect/api.py", line 116, in __post
    response = await self.__session.post(
  File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 559, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/usr/local/lib/python3.10/site-packages/aiohttp/streams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/config/custom_components/mastertherm/coordinator.py", line 211, in update_state
    return_value = await self.mt_controller.set_device_data_item(
  File "/usr/local/lib/python3.10/site-packages/masterthermconnect/controller.py", line 566, in set_device_data_item
    return await self.__api.set_device_data(
  File "/usr/local/lib/python3.10/site-packages/masterthermconnect/api.py", line 463, in set_device_data
    response_json = await self.__post(
  File "/usr/local/lib/python3.10/site-packages/masterthermconnect/api.py", line 136, in __post
    raise MasterthermConnectionError("3", "Client Connection Error") from ex
masterthermconnect.exceptions.MasterthermConnectionError: 3

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 451, in _async_step
    await getattr(self, handler)()
  File "/usr/src/homeassistant/homeassistant/helpers/script.py", line 684, in _async_call_service_step
    await service_task
  File "/usr/src/homeassistant/homeassistant/core.py", line 1787, in async_call
    task.result()
  File "/usr/src/homeassistant/homeassistant/core.py", line 1824, in _execute_service
    await cast(Callable[[ServiceCall], Awaitable[None]], handler.job.target)(
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 213, in handle_service
    await service.entity_service_call(
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 680, in entity_service_call
    future.result()  # pop exception if have
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 968, in async_request_call
    await coro
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 720, in _handle_entity_call
    await result
  File "/config/custom_components/mastertherm/switch.py", line 75, in async_turn_on
    await self.coordinator.update_state(
  File "/config/custom_components/mastertherm/coordinator.py", line 216, in update_state
    raise UpdateFailed("connection_error") from ex
homeassistant.helpers.update_coordinator.UpdateFailed: connection_error

Not sure if there was issue with the server or my internet connection had a blip, but looks like network connectivity issue to me.

Also playing with the D_29 and pAD values it is clear my theory regarding polling the API after update is dead. As "once the server accepts the value it is visible on the API" seems to be the case at least when the communication between HP and MT server works fine.

sHedC commented 1 year ago

Are you sure "Solar" settings are for anything related to PV? I would have thought they are for controlling when the solar panels should be used to heat the DHW boiler tank. I am afraid you are messing solar panels and photovoltaic panels which are completely different things ;-)

Yes you are right I max mixing up terms. I asked my installer about the solar option on the heat pump, he said it was already on it and is related to heating water using the solar thermal collector (not PV), he said it just turns off heating the DHW when the panels are over a given temperature.

It seems that other than turning on/ off everything else is OEM set.

Seems like D_29 works for enabling/disabling of the DHW. At least it switches ON/OFF on the main control panel. Will test if it really disables heating of DHW. But in HA the domestic_hot_water_enabled stays active no matter the state of D_29, which is weird.

The demo system for the old servers has the Solar enabled, at the moment the solar panel temp is 59.4 degrees and D_29 is off. D_275 enabled is on. will check it later see what happens.

Also playing with the D_29 and pAD values it is clear my theory regarding polling the API after update is dead. As "once the server accepts the value it is visible on the API" seems to be the case at least when the communication between HP and MT server works fine.

Yes I think I found it, if the token becomes invalid before the expiry time the set fails and attempts to re-connect, two things wrong one was I was raising the exception at the wrong place (inside a try) and I was doing a get not a post for the set value, also no delay so likely it failed.

Fixing it and checking your logs match my findings.

sHedC commented 1 year ago

Released 1.1.0 Beta 0

Added HVAC mode but can't test it myself, also added idle and aux-heater to operating mode and hopfully fixed the issues you are having.

johny-mnemonic commented 1 year ago

I have updated yesterday and now I noticed that DHW control is in state off while HP is happily heating the DHW ;-) Will let you know in case I spot any other issue.

sHedC commented 1 year ago

Off is the only option if there is no control enabled, DHW no control as yet, pending our investigation of D_29 in the MasterthermConnect.

D_29 looks like the control for On/ Off for hot water.

sHedC commented 1 year ago

I only enabled the HVAC Off/ Auto on the HC1 to HC6 circuits so you should see it on the room thermostat for HC1

sHedC commented 1 year ago

Added HVAC support for DHW and changed default for no HVAC mode to Auto

johny-mnemonic commented 1 year ago

I can confirm that DHW control works as expected. Thanks

johny-mnemonic commented 1 year ago

Unfortunately I still see this pattern of setting the value and the value being reverted back after the refresh interval (30s). MT_HA_6

There is only single log entry from MT integration, which is from different automation:

Logger: homeassistant.components.automation.mastertherm_ovladani_hc1
Source: helpers/script.py:1453
Integration: Automation ([documentation](https://www.home-assistant.io/integrations/automation), [issues](https://github.com/home-assistant/home-assistant/issues?q=is%3Aissue+is%3Aopen+label%3A%22integration%3A+automation%22))
First occurred: February 22, 2023 at 8:00:59 AM (2 occurrences)
Last logged: 8:00:59 AM

Mastertherm - Ovládání HC1: Already running

I have two triggers there and both triggered at the same time... Will need to handle that.

sHedC commented 1 year ago

Um thats not an error I generate, what triggers are you running so I can test. I already have code that protects against multiple calls at the same time but not sure what that message means.

johny-mnemonic commented 1 year ago

@sHedC Sorry for the confusion. The error is definitely not caused by your integration. I just didn't notice it is not even generated by your integration. It was a bug in my automation. I didn't realize both those triggers can trigger at the same time. I have already fixed it.

So everything is still the same as before. There are updates that are getting reverted after scan interval and no logs for any issues.

sHedC commented 1 year ago

Can you share your automations, if I can replicate it I can probably find a fix for it. I have no issues all updates work but my automations are not frequent or complex.

I tried to do some regular updates on something but again no issues so not sure if your automations maybe overlapping and my code is not handling it properly, or it thinks it is but the API is not. The only thing I can think of right now is that my gap between requests is too small for what you are doing :)

PS I don't actually know a lot about automations, I tried to work out how you set yours up but couldn't so details instructions would be good :)

johny-mnemonic commented 1 year ago

Again, sorry for the confusion. There is no issue you can fix with my automation error Already running. That was solely the issue with two triggers that both triggered on the same external event, so the automation tried to run twice at the same time. Which was prevented by HASS without it even going into your code. You can forget about it.

The only issue with the MT integration are those values which are reverting back after being set as shown on the graph. The automation issue is totally unrelated to this and I added it here by mistake as I didn't notice the error is not from MT integration.

sHedC commented 1 year ago

Yes thats ok I understand that part.

Was just wanting to replicate your automations i my system to see if I can get that issue with the values reverting back after being set, I don't get that when I try automations but I see you have an automation running very frequently so just wanted to replicate it and get the same issue so I can debug it.

johny-mnemonic commented 1 year ago

@sHedC I see, here you are:

alias: Mastertherm - Ovládání teploty
description: ""
trigger:
  - platform: state
    entity_id:
      - climate.mt_xxxx_1_heating_circuits_hc1_thermostat
    id: Change
    attribute: current_temperature
    for:
      hours: 0
      minutes: 0
      seconds: 5
condition: []
action:
    - service: climate.set_temperature
      data_template:
        temperature: >-
          {{
          state_attr('climate.mt_xxxx_1_heating_circuits_hc1_thermostat',
          'current_temperature') | float(0) + 0.5 }}
      target:
        entity_id: climate.mt_xxxx_1_heating_circuits_hc1_thermostat
mode: single

Nothing special...

sHedC commented 1 year ago

Thanks will try something similar didn't now you could do this I was only using the UI for automations.

I have updated to 1.1.0 Beta 2 where I added some more logs to the set command and also found a school boy error where I forgot to await the asyncio.sleep() so could have been that but not sure.

sHedC commented 1 year ago

@johny-mnemonic - how long did it take before you saw the system no updating the temp? I have mine running for nearly 24 hours no issues.

johny-mnemonic commented 1 year ago

@sHedC not sure when I updated to beta 2 (most probably late in the night) but I have seen the issue throughout whole day yesterday while today I don't have even a single occurrence. So whatever you did seems to have fixed it 🎉

sHedC commented 1 year ago

@johny-mnemonic - Great

I did a few things, 1 corrected the post command on re-try if connection returns invalid token, 2 added await to the asyncio.sleep so it actually waiting for a period before re-trying and increased the wait time to 0.2s

this means that if you trigger 10 actions at the same time it will take 2 seconds to complete all the actions you can see this visually by turning on and off a switch 10 times then waiting to watch it catch up.

also if the token gets an invalid response early then it sleeps for 0.2s before retrying to connect.

The API returns a server disconnected if you send two commands one after the other without a pause.

I will leave this open whilst I finish up the main release and just update later to see if things are still ok before closing.

sHedC commented 1 year ago

@johny-mnemonic - Still ok can I close this?

johny-mnemonic commented 1 year ago

@sHedC I am afraid I was too eager to report success:-( Good news is it is definitely way better and happens only rarely since updating to beta 2.

Yesterday there were 6 events like that and today only 4. Previously there were times where it was 4 out of 6 api calls. Being it so random I guess it's their loosy API 🤷‍♂️

Yesterday I have enabled the debug log for MT integration but there is no logged event for any of those 4 events that happened today. Any idea where to add some logging so we can catch it?

There were only two exceptions throughout a whole day, both at different time than the issue with reverted value:

2023-02-27 12:34:49.094 ERROR (MainThread) [custom_components.mastertherm.coordinator] Timeout fetching mastertherm data
2023-02-27 12:34:49.095 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 10.631 seconds (success: False)
2023-02-27 12:35:19.713 INFO (MainThread) [custom_components.mastertherm.coordinator] Fetching mastertherm data recovered

BTW: If you have more interesting things to do than to spent time on this issue than go ahead. This is low-prio now I would guess since it became so rare...

sHedC commented 1 year ago

Yeh just going to finish up to release a 2.1.0 and push to HACS.

There is a Beta 5 which has an increased timeout. I reduced the timeout to 10 seconds but saw a few of these too so increased it to a couple of min in beta 5.

johny-mnemonic commented 1 year ago

I have updated to beta 5 in the night so will let you know.

BTW: I noticed my Fan sensor vanished and instead I now have Thermal Pump. That's a bit confusing as the thermal pump is a fancy name for a compressor which is already there. I probably see where it is coming from. You don't have Air-Water HP so for your HP the "Fan" doesn't make sense. Correct? In the documentation they say this sensor is either a "Fan" for a Air-Water HP or it is a circulation pump of primary circuit for Water-Water HPs. As you are now detecting what kind of HP it is (I see mine as Air-Water with cooling in the Integration), you can select proper name for this sensor. WDYT?

sHedC commented 1 year ago

Good news is it is definitely way better and happens only rarely since updating to beta 2.

Yesterday there were 6 events like that and today only 4. Previously there were times where it was 4 out of 6 api calls. Being it so random I guess it's their loosy API man_shrugging

The weird thing is I check both for a return of an error for the API, for exceptions and for the return value matching the value I sent. So yes I think it must be a problem between the API and the heatpump.

For the Timeout issue I will add a check and re-try on timeout same as I did for the invalid token. I might increase the wait time between the errors and a re-try, can't think of anything else, I will also add a warning message if it occurs so we know.

I will leave this open after release so we can keep an eye. funny I didn't get any problems when I ran the same automation as you for 24 hours, the only thing I saw was a short period where the API stopped responding (about 1 min).

sHedC commented 1 year ago

BTW: I noticed my Fan sensor vanished and instead I now have Thermal Pump. That's a bit confusing as the thermal pump is a fancy name for a compressor which is already there.

Thermal Pump is teh best name I could come up with, its Fan for Air Source, Brine Pump for Ground Source and Water Pump for Water Source :(

do you think wdyt would be a better name? I did think of setting it based on the HP_type? Could do that as I can pick up that detail.

Also do you have any idea what D/W is? I have A/W W/W B/W and there is a D/W

sHedC commented 1 year ago

Changes to 1.1.0

Now shows fan_running for Air Source and brine_pump_running for others
D/W is Direct Exchange Ground Source should be DX/W
Added re-try on Timeout before failing
Added Debug Logging of the Set for the return not sure what else, if it doesn't raise and exception and responds with no error no idea why the settings don't take.

Going to deploy this to HACS default

sHedC commented 1 year ago

@johny-mnemonic - Just had a thought, I see in the graphs its re-setting to 20, but following the settings you have I am not sure why it would reset to 20 rather than stay at the previous value?

johny-mnemonic commented 1 year ago

@sHedC it is not re-setting to any particular value. It is always reverting to the last successfully set value. And these two days are like a night and day. Today I don't have any single such event while when I now checked yesterdays chart it is as bad at it can get: MT_HA_7

Unfortunately it is not with the same version as I upgraded the integration to the one you released yesterday evening. No clue whether you did some major change for this in the release but we have seen this before that on some days it could be perfectly fine... I hate these "chasing a ghost" like issues which are purely random :-)

sHedC commented 1 year ago

Last Night I updated to 1.1.0 and connect module to 2.2.0.

For this issue and yes chasing Ghosts:

I added Timeout catch and re-try so if the Token Expires early or the Server just times out it re-trys
Increased the Delay between Failure and Retry to 0.5s from 0.1s, so when failing the API waits 0.5s then re-tries.
Increased the Session timeout to 15 seconds from 10 seconds
Increased the delay between allowed calls from 0.1s to 0.2s
Added Debug logging to the Set Command to log the response from the API.

I put the same integration you have running on mine (on a thermostat I don't use) after updating to 1.1.0 and so far its worked flawlessly.

sHedC commented 1 year ago

@johny-mnemonic I have been running the same logic as you now since the 1st March and have experienced 0 issues, we are both on the same API version so if you are still seeing them then possible its something else, maybe your system overriding the request?

Still seeing the issue?

johny-mnemonic commented 1 year ago

@sHedC Unfortunately except from that 2nd of March where it was spotless whole day (apparently due to some miracle), it is mostly not working :-( Do you have an idea where I could add some logging to see each request sent to MT API and it's response when the debug is enabled? I have the debug enabled, but it only prints info about fetching the data. That's all. I have found a few warnings regarding token expiry, but these are really rare and don't correlate with the issue at all.

2023-03-07 10:12:59.871 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 0.708 seconds (success: True)
2023-03-07 10:13:29.203 WARNING (MainThread) [masterthermconnect.api] Token Expired Early Retry: 200:User not logged in
2023-03-07 10:13:30.179 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 1.016 seconds (success: True)

It is entirely possible there is some issue in my HA but without the logs it's hard to even start digging deeper 🤷‍♂️ I have seen similar issue in the past with unrelated stuff and the culprit was a custom lovelace card which was changing the changed value. Could be the thermostat card is the cause. I will try to remove the climate entity from the UI completely and we will see.

It's clear the issue have to be somewhere in the HA as I am 100% sure that every call to set the requested temp is properly delivered to the HP. Just the feedback loop back to HA is misbehaving or something else messing with the value in HA. I would understand if it behaved this way:

requested temp in HA is 24
call service to set requested temp to 25
MT API starts returning 25
HA showing requested temp 25
after 30s HA returns back to 24
HP changes to 25
after some time HA gets the correct 25 value from the MT API

But regardless of the source of the trouble I am puzzled why the integration is able to run for extended periods of time without noticing that the value on MT API and in HA are out of sync 🤯

sHedC commented 1 year ago

Do you have an idea where I could add some logging to see each request sent to MT API and it's response when the debug is enabled?

In the api.py there are two methods: async def __post and async def _get these do all the communications, the get reads and the post sets. There is also connect and re-connect but that's can be ignored.

Problem here is if the Web API returns that its successful I accept that within my code, if it returns an error or the return value does not match what I sent then I log and reject the update so if we are not seeing that adding logging I suspect will only show that the problem is somewhere between the MT Web and your HP.

I would understand if it behaved this way:

requested temp in HA is 24

call service to set requested temp to 25

MT API starts returning 25

HA showing requested temp 25

after 30s HA returns back to 24

HP changes to 25

after some time HA gets the correct 25 value from the MT API

But regardless of the source of the trouble I am puzzled why the integration is able to run for extended periods of time without noticing that the value on MT API and in HA are out of sync 🤯

For step 2/3 the API returns no error and that the temp has been set so the integration accepts the new temp and sets it internally, showing in Step 4. Even if I reject the temp it takes a bit for HA to respond however if there was an issue you would see a log as I log all warnings and errors as you can see form the Warning you got about the token expiring.

Step 5 the API is returning 24 showing that in-fact it didn't stick, or something set it back?

I see you had a token expire warning, this happens if the token becomes invalid before its time but then you see the call after this is it recovering so this is only a Warning.

Code for Get is quite long, post is very similar I can add logging at the start and end of this function and the post one?

    async def __get(self, url: str, params: str) -> dict:
        """Get updates from the API, for old this mostly uses Post."""
        if await self.__token_expired():
            await self.__connect_refresh()

        try:
            if self.__api_version == "v1":
                # Original uses post, with Cookie Token
                response = await self.__session.post(
                    urljoin(URL_BASE, url),
                    data=params,
                    headers={"content-type": "application/x-www-form-urlencoded"},
                    cookies={"PHPSESSID": self.__token, "$version": "1"},
                )
            else:
                # New uses get, with Authorization Bearer
                response = await self.__session.get(
                    urljoin(URL_BASE_NEW, url),
                    params=params,
                    headers={
                        "Authorization": f"Bearer {self.__token}",
                        "Host": "mastertherm.online",
                        "Connection": "close",
                    },
                )

            response_json = await response.json()
        except ClientConnectionError as ex:
            _LOGGER.error("Client Connection Error: %s", ex)
            raise MasterthermConnectionError("3", "Client Connection Error") from ex
        except JSONDecodeError as ex:
            response_text = await response.text()
            _LOGGER.error(
                "JSON Decode Error: %s:%s",
                response.status,
                response_text,
            )
            raise MasterthermResponseFormatError(response.status, response_text) from ex
        except ContentTypeError as ex:
            if response.status == 504:
                raise MasterthermServerTimeoutError(
                    response.status, response.reason
                ) from ex

            response_text = await response.text()
            if response_text == "User not logged in":
                raise MasterthermTokenInvalid(response.status, response_text) from ex
            else:
                _LOGGER.error(
                    "Mastertherm API some other error: %s:%s",
                    response.status,
                    response_text,
                )
                raise MasterthermConnectionError(response.status, response_text) from ex

        # Version 2 responds with an error and json.
        if response.status != 200:
            # Deal with the v2 error if we have a response
            if response_json["status"]["id"] == 401:
                raise MasterthermTokenInvalid(response.status, response_json)
            else:
                _LOGGER.error("Mastertherm API some other error: %s", response_json)
                raise MasterthermResponseFormatError(response.status, response_json)

        return response_json

johny-mnemonic commented 1 year ago

I think we can rule the API calls out completely. Seems like I was correct when suspecting internal issue in HA. Look at following picture. I think it explains it all: MT_HA_8 I am calling the climate.set_temperature on the climate entity climate.mt_XXXX_1_heating_circuits_hc1_thermostat which is shortly changed after the service call and then reverted back to the original value, while the sensor.mt_XXXX_1_heating_circuits_hc1_ambient_requested is not even changed. All that while the HP receives and set's the new requested temperature.

I managed to catch it right now in this state. Automation called the service to set the climate to 25.6 which passed successfully and set the temp on the MT API (+ in HP), but in the HA it reverted back to 25.5 and stayed like that: MT_HA_9

And while in HA it returned back to the original state of 25.5, in HP the change was successful and is reported by the API 😕

~/git/python-masterthermconnect ‹2.1.0-b0*› » python masterthermconnect/__main__.py get --api-ver v1 --user my@email --password my_passowrd --pretty reg XXXX 1 A_219
A_219 = 25.6
~/git/python-masterthermconnect ‹2.1.0-b0*› » date
Thu Mar  9 02:48:11 PM CET 2023

Unfortunately my theory with the UI reverting the value is not confirmed. I have removed the climate entity from the UI and it still happens...

It would be great if you can add the debug logging of what values you are getting from the API so we have the comparison with what is in HA as it seems the value is set correctly on the API and stays there, but something prevents the value to change in HA. It must be getting the new value with each call to MT API but not updating the value in HA which is really weird.

sHedC commented 1 year ago

I will add debug logging to the API and the Integration to capture what we can. There is already debug logging in the API for the set command.

Do you see this logging if you switch it on?

    async def set_device_data(
        self, module_id: str, unit_id: str, register: str, value: any
    ) -> bool:
        """Set device data a specific register to a specific value,
        updating any registry setting can cause the system to stop working
        the controller only allows tested updates this API has no protection.

        Parameters:
            module_id (str): This is the module_id for the unit
            unit_id (str): This is the unit id for the unit
            register (str): The Register to update
            value (str|float): The value to set.

        Return:
           success (bool): return true if succes and false if not.

        Raises:
            MasterthermConnectionError - General Connection Issue
            MasterthermTokenInvalid - Token has expired or is invalid
            MasterthermResponseFormatError - Some other issue, probably temporary
            MasterthermServerTimeoutError - Server Timed Out more than once."""
        retry = False
        params = (
            f"moduleId={module_id}&deviceId={unit_id}&"
            + "configFile=varfile_mt1_config&messageId=1&errorResponse=true&"
            + f"variableId={register}&variableValue={value}&application=android"
        )

        try:
            response_json = await self.__post(
                url=URL_POSTUPDATE
                if self.__api_version == "v1"
                else URL_POSTUPDATE_NEW,
                params=params,
            )
        except MasterthermTokenInvalid as ex:
            _LOGGER.warning(
                "Token Expired Early Retrying: %s:%s", ex.status, ex.message
            )
            retry = True
            self.__expires = None
        except MasterthermServerTimeoutError as ex:
            _LOGGER.warning("API Timed Out Retry: %s:%s", ex.status, ex.message)
            retry = True

        if retry:
            await asyncio.sleep(0.5)
            response_json = await self.__post(
                url=URL_POSTUPDATE
                if self.__api_version == "v1"
                else URL_POSTUPDATE_NEW,
                params=params,
            )

        _LOGGER.debug("Set Response Received: %s", response_json)

        # Check the response for any errors:
        if response_json["error"]["errorId"] != 0:
            _LOGGER.error(
                "Client Connection Error: %s, Message: %s",
                response_json["error"]["errorId"],
                response_json["error"]["errorMessage"],
            )
            return False

        # Get the Value that is returned and set.
        if self.__api_version == "v1":
            data_file = ""
            if "varfile_mt1_config1" in response_json["data"]:
                data_file = "varfile_mt1_config1"
            elif "varfile_mt1_config2" in response_json["data"]:
                data_file = "varfile_mt1_config2"

            set_value = response_json["data"][data_file][str(unit_id).zfill(3)][
                register
            ]
        else:
            set_value = response_json["data"]["data"][str(unit_id).zfill(3)][register]

        return set_value == value

sHedC commented 1 year ago

seems my debug logging is not getting activated when you ask for debugging in the custom component. Will have to work out how that works :)

sHedC commented 1 year ago

Ok got it you need to specify the logging for the API, for now it only shows debug logs for the set command but updating to add other logging.

logger:
  default: warning
  logs:
    # Log for Mastertherm
    custom_components.mastertherm: debug
    masterthermconnect: debug

sHedC commented 1 year ago

@johny-mnemonic
Added 1.1.1 - Logging added, enable with:

logger:
  default: warning
  logs:
    # Log for Mastertherm
    custom_components.mastertherm: debug
    masterthermconnect: debug

johny-mnemonic commented 1 year ago

@sHedC installed and logging enabled. Will report any interesting findings. Thanks

And here we are. At 01:39:38 automation set requested temp to 24.7 (from 24.1):

2023-03-10 01:39:38.373 INFO (MainThread) [masterthermconnect] Set module XXXX:1 property: heating_circuits.hc1.ambient_requested, register: A_219, value: 24.7
2023-03-10 01:39:38.373 INFO (MainThread) [masterthermconnect.api] Set Device Reg XXXX:1:A_219:24.7
2023-03-10 01:39:38.373 DEBUG (MainThread) [masterthermconnect.api] __post: data to: /mt/ActiveVizualizationServlet, params: moduleId=XXXX&deviceId=1&configFile=varfile_mt1_config&messageId=1&errorResponse=true&variableId=A_219&variableValue=24.7&application=android
2023-03-10 01:39:38.633 DEBUG (MainThread) [masterthermconnect.api] __post: response status: 200, content: {"timestamp":1678408778,"error":{"errorMessage":"","errorId":0},"messageId":1,"data":{"varfile_mt1_config1":{"001":{"A_219":"24.7"}}},"type":"write"}
2023-03-10 01:39:38.633 INFO (MainThread) [masterthermconnect.api] Set Device Reg response: {'timestamp': 1678408778, 'error': {'errorMessage': '', 'errorId': 0}, 'messageId': 1, 'data': {'varfile_mt1_config1': {'001': {'A_219': '24.7'}}}, 'type': 'write'}
2023-03-10 01:39:38.633 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished setting mastertherm data in 0.2599 seconds

At 01:40:08 (exactly after 30s) the requested temp returned back to 24.1. Represented by this blank get from API:

2023-03-10 01:40:08.110 INFO (MainThread) [masterthermconnect.api] Get Device Data XXXX:1
2023-03-10 01:40:08.111 DEBUG (MainThread) [masterthermconnect.api] __get: data from: /mt/PassiveVizualizationServlet
2023-03-10 01:40:08.305 DEBUG (MainThread) [masterthermconnect.api] __get: response status: 200, content: {"timestamp":1678408808,"error":{"errorMessage":"","errorId":0},"messageId":2,"data":{}}
2023-03-10 01:40:08.507 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 0.397 seconds (success: True)

Since then, there is not a single Get Device Data call containing A_219 reg. It looks to me like you are only asking the API for changed values (which totally makes sense). But it finally explains why the integration doesn't find there is discrepancy between A_219 in HA and on the MT API.

Off-course on MT API the value is changed as requested:

~/git/python-masterthermconnect ‹2.1.0-b0*› » python masterthermconnect/__main__.py get --api-ver v1 --user my@email --password my_passowrd --pretty reg XXXX 1 A_219
A_219 = 24.7
~/git/python-masterthermconnect ‹2.1.0-b0*› » date                                                                                                                           
Fri Mar 10 02:04:01 AM CET 2023

Here is how it looks when the change is successful in HA:

2023-03-09 23:39:55.416 INFO (MainThread) [masterthermconnect] Set module XXXX:1 property: heating_circuits.hc1.ambient_requested, register: A_219, value: 24.2
2023-03-09 23:39:55.416 INFO (MainThread) [masterthermconnect.api] Set Device Reg XXXX:1:A_219:24.2
2023-03-09 23:39:55.416 DEBUG (MainThread) [masterthermconnect.api] __post: data to: /mt/ActiveVizualizationServlet, params: moduleId=XXXX&deviceId=1&configFile=varfile_mt1_config&messageId=1&errorResponse=true&variableId=A_219&variableValue=24.2&application=android
2023-03-09 23:39:56.547 DEBUG (MainThread) [masterthermconnect.api] __post: response status: 200, content: {"timestamp":1678401596,"error":{"errorMessage":"","errorId":0},"messageId":1,"data":{"varfile_mt1_config1":{"001":{"A_219":"24.2"}}},"type":"write"}
2023-03-09 23:39:56.547 INFO (MainThread) [masterthermconnect.api] Set Device Reg response: {'timestamp': 1678401596, 'error': {'errorMessage': '', 'errorId': 0}, 'messageId': 1, 'data': {'varfile_mt1_config1': {'001': {'A_219': '24.2'}}}, 'type': 'write'}
2023-03-09 23:39:56.547 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished setting mastertherm data in 1.1317 seconds
2023-03-09 23:39:56.750 INFO (MainThread) [masterthermconnect.api] Get Device Data XXXX:1
2023-03-09 23:39:56.750 DEBUG (MainThread) [masterthermconnect.api] __get: data from: /mt/PassiveVizualizationServlet
2023-03-09 23:39:57.654 DEBUG (MainThread) [masterthermconnect.api] __get: response status: 200, content: {"timestamp":1678401597,"error":{"errorMessage":"","errorId":0},"messageId":2,"data":{"varfile_mt1_config1":{"001":{"A_219":"24.2"}}},"type":"read"}
2023-03-09 23:39:57.856 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 1.745 seconds (success: True)

The difference is the Get Device Data call right after the change is not blank but contains the change you just did to the A_219. Could this be some timing issue? Maybe those standard 30s are sometimes not enough for the slow MT API and when that happens and there is not enough space between the Set module property and Get Device Data calls it will not give you the changed data in the next Get Device Data call after the Set module property 🤷‍♂️ We do not know how it works internally, maybe it only returns the A_219 in the list of changed values after it get's the confirmation from the HP. It could also be the other way around. Maybe the change is lost if there is too big of a window between two Get Device Data calls. As we see in the successful update, it returned the change not even a full 1 second later after it happened...

Also, sure enough. When I changed the value manually on the pAD it got updated in HA immediately:

2023-03-10 02:43:21.111 INFO (MainThread) [masterthermconnect.api] Get Device Data XXXX:1
2023-03-10 02:43:21.111 DEBUG (MainThread) [masterthermconnect.api] __get: data from: /mt/PassiveVizualizationServlet
2023-03-10 02:43:21.161 DEBUG (MainThread) [masterthermconnect.api] __get: response status: 200, content: {"timestamp":1678412601,"error":{"errorMessage":"","errorId":0},"messageId":2,"data":{"varfile_mt1_config1":{"001":{"A_7":"11.5","I_202":"42","A_78":"8.0","A_219":"24.5","A_2":"19.2"}}},"type":"read"}

So the issue is solely with loosing the change in the updates from API. Everything else seems to work perfectly.

I don't know whether it makes sense, but what about explicitly asking for the state of the changed register in the next Get Device Data call after the Set module property call? Not relying on it being returned as changed automatically.

sHedC / homeassistant-mastertherm