sHedC / homeassistant-mastertherm

Home Assistant Mastertherm Component, to communicate and control heat pumps from Mastertherm
MIT License
6 stars 2 forks source link

HP keeps showing as unavailable #177

Open discodancerstu opened 7 months ago

discodancerstu commented 7 months ago

System Health details

System Information

version core-2024.1.6
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.11.6
os_name Linux
os_version 6.1.74-haos
arch aarch64
timezone Europe/London
config_dir /config
Home Assistant Community Store GitHub API | ok -- | -- GitHub Content | ok GitHub Web | ok GitHub API Calls Remaining | 5000 Installed Version | 1.34.0 Stage | running Available Repositories | 1387 Downloaded Repositories | 30
Home Assistant Cloud logged_in | true -- | -- subscription_expiration | 13 February 2024 at 00:00 relayer_connected | true relayer_region | eu-central-1 remote_enabled | true remote_connected | true alexa_enabled | false google_enabled | true remote_server | eu-central-1-14.ui.nabu.casa certificate_status | ready instance_id | 06da5d9fc16149ee94e75958ea0db39b can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 11.5 -- | -- update_channel | stable supervisor_version | supervisor-2024.01.1 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 56.6 GB disk_used | 13.3 GB healthy | true supported | true board | odroid-n2 supervisor_api | ok version_api | ok installed_addons | Samba share (12.2.0), Terminal & SSH (9.8.1), Mosquitto broker (6.4.0), GivTCP (2.4.3), File editor (5.7.0), eWeLink Smart Home (1.4.3), appdaemon-predbat (1.0.6)
Dashboards dashboards | 1 -- | -- resources | 18 views | 18 mode | storage
Recorder oldest_recorder_run | 31 January 2024 at 09:42 -- | -- current_recorder_run | 8 February 2024 at 20:07 estimated_db_size | 1815.05 MiB database_engine | sqlite database_version | 3.41.2
Solcast PV Forecast can_reach_server | ok -- | -- used_requests | 1 rooftop_site_count | 1

Checklist

Describe the issue

My Mastertherm HP, at random points, becomes unavailable through HA. I can't seem to find out why, the network is rock solid (HP connected via LAN) and no other devices have this issue.

I wonder if anyone else has this?

Reproduction steps

Through logbook, HP shows as unavailable for around 2 -3 minutes at a time.

Logs

N/A

Diagnostics dump, if available

No response

SeBsZ commented 7 months ago

Yes! I believe I have the same issue. Usually it happens about once or twice a day, but lately I've had periods where sometimes even for hours it will be unavailable nearly all the time. I feel like these are temporary outages or issues on MasterTherm's end, but I don't know for sure. Maybe we can see if these line up between all of us? I'm in Europe, Belgium.

image

Here you can see when my 'Unavailable' occurs in the past 24 hours or so.

sHedC commented 7 months ago

Also what version there are two different API's I am on the older one other than the 8th Feb for 2 hours and another event this year I have not had many outages.

Note the Operating Mode has Offline which means the API is working but can't talk to the Heat Pump and then Unavailable which means the API is not talking, I did this to separate out when your pump is not talking to the API vs when the API is shutdown.

Easier to show the outside temp as it shows a break when offline or unavailable.

discodancerstu commented 7 months ago

Mine seems a bit less than yours Screenshot 2024-02-09 100928 In the last 24 hours (times are GMT) 8 Feb: 1602-1606 - duration 3:59 8 Feb: 1921-1927 - duration 6:00 9 Feb: 0202-0204 - duration 2:00

Looking back a bit further, the downtime almost always seems to be either 2:00, 3:59 or 6:00. The times are not consistent and there is no pattern.

discodancerstu commented 7 months ago

Also what version there are two different API's I am on the older one other than the 8th Feb for 2 hours and another event this year I have not had many outages.

Note the Operating Mode has Offline which means the API is working but can't talk to the Heat Pump and then Unavailable which means the API is not talking, I did this to separate out when your pump is not talking to the API vs when the API is shutdown.

Easier to show the outside temp as it shows a break when offline or unavailable.

Checking my outside temp, it shows no temperature during the exact times when my HP is unavailable.

sHedC commented 7 months ago

image

White at the beginning is unavailable mine is v1 api.

sHedC commented 7 months ago

one more thing I noticed I had outage and could not login via the mobile app or could login but got no pump avaialable, at one point it said my password was invalid.

However at that time I could login to the demo server, so not sure their system is totally robust.

sHedC commented 7 months ago

Don't suppose anyone has the error's in the log for this, should be Connection Error or Timeout Error or FormatError.

discodancerstu commented 7 months ago

Don't suppose anyone has the error's in the log for this, should be Connection Error or Timeout Error or FormatError.

Where would we find this?

sHedC commented 7 months ago

Don't suppose anyone has the error's in the log for this, should be Connection Error or Timeout Error or FormatError.

Where would we find this?

In Settings->System-Logs when opens it shows summary of warnings and errors, search for MasterTherm. Don't share a log with the Module id e.g. mt_1234_1 remove that bit first.

There will be errors as I report both API and Integration errors when connections fail if the failure is not temporary.

discodancerstu commented 7 months ago

Don't suppose anyone has the error's in the log for this, should be Connection Error or Timeout Error or FormatError.

Where would we find this?

In Settings->System-Logs when opens it shows summary of warnings and errors, search for MasterTherm. Don't share a log with the Module id e.g. mt_1234_1 remove that bit first.

There will be errors as I report both API and Integration errors when connections fail if the failure is not temporary.

Home Assistant Core 2024-02-08 20:07:50.950 WARNING (SyncWorker_1) [homeassistant.loader] We found a custom integration mastertherm which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant

2024-02-09 10:29:26.610 WARNING (SyncWorker_5) [homeassistant.loader] We found a custom integration mastertherm which has not been tested by Home Assistant. This component might cause stability problems, be sure to disable it if you experience issues with Home Assistant

sHedC commented 7 months ago

This is standard as its not integrated with HASS (thinking about doing that so its a core integration.

you will see an ERRROR with mastertherm, the logs are reset after a restart maybe check after you see an issue in the history and before re-starting.

sHedC commented 7 months ago

Example you either re-started or have a very clean system I would expect two warnings after 2024.2.0 update as I have just raised an issue for those.

image

SeBsZ commented 7 months ago

I also have nothing useful in the logs right now, as my home assistant restarts many times a day with no trace in the logs as to what it is doing. I suspect maybe my Raspberry Pi 3 is not handling everything, so I'm looking to upgrade to a more powerful home assistant system. I'll share the log when MasterTherm goes offline again.

sHedC commented 7 months ago

I also have nothing useful in the logs right now, as my home assistant restarts many times a day with no trace in the logs as to what it is doing. I suspect maybe my Raspberry Pi 3 is not handling everything, so I'm looking to upgrade to a more powerful home assistant system. I'll share the log when MasterTherm goes offline again.

That is weird some issue with the Pi 3? I am running the HASS OS in a VM under Unraid, it stays up until I re-start it or do an update, about once a month.

I setup a HASS Yellow for my friend that has been up since the 1st Feb (when I updated it) no issues. actually quite impressed with that.

Maybe out of memory issues?

If you ssh in you have under the homeassistant folder older logs e,g, home-assistant.log.1 which is the older one.

you can also use top to get memory usages in ssh,

sHedC commented 6 months ago

Anyone further updates or has the system stabalised.

For local access it should be possible and works the same way as the cloud API, however I am not able to get past the initial authorization step as I believe I need a valid client certificate (not user/ password). My installer is reaching out to Mastertherm to see if I can have the certificate but not holding my breath.

You can access the heat pump locally using ssh (with some really weird parameters), telnet with a password and it has https using non standard crypts and client certificate. So unless Mastertherm give me a valid certificate or password I am a not able to make progress.

SeBsZ commented 6 months ago

This morning I got one of the worst offline/online situations ever:

image

All the orange is where the operating mode of the pump is set to 'Offline'. The longest of these are about 7 minutes, sometimes it's just a minute.

Without debug logging, the logs show nothing important. With debug logging, I don't see much either. Just:

2024-03-11 09:37:01.395 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 11.332 seconds (success: True)
2024-03-11 09:38:01.422 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 0.360 seconds (success: True)
2024-03-11 09:39:03.737 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 2.674 seconds (success: True)
2024-03-11 09:41:03.059 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 59.996 seconds (success: True)

I believe the middle one was actually where the entities turned 'unavailable', though it seems to log everything was successful. Strange.

sHedC commented 6 months ago

This morning I got one of the worst offline/online situations ever:

image

All the orange is where the operating mode of the pump is set to 'Offline'. The longest of these are about 7 minutes, sometimes it's just a minute.

Without debug logging, the logs show nothing important. With debug logging, I don't see much either. Just:

2024-03-11 09:37:01.395 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 11.332 seconds (success: True)
2024-03-11 09:38:01.422 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 0.360 seconds (success: True)
2024-03-11 09:39:03.737 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 2.674 seconds (success: True)
2024-03-11 09:41:03.059 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 59.996 seconds (success: True)

I believe the middle one was actually where the entities turned 'unavailable', though it seems to log everything was successful. Strange.

Might need to turn on the masterthermconnect debug logs there are more detailed about in and out in that. masterthermconnect:debug.

Although its going to give a fair amount of logs. I could alter these types of message to a warning if that is better.

SeBsZ commented 6 months ago

Ah, how do I enable that one?

sHedC commented 6 months ago

Just add below the custom_components.mastertherm: debug

logger:
  logs:
    custom_components.mastertherm: debug
    masterthermconnect: debug
SeBsZ commented 6 months ago

Hmm, even before enabling mastherthermconnect debug I now see a few errors:

2024-03-11 09:44:20.486 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 0.424 seconds (success: True)
2024-03-11 09:45:38.048 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished setting mastertherm data in 37.6641 seconds
2024-03-11 09:45:38.910 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 18.847 seconds (success: True)
2024-03-11 09:47:39.885 ERROR (MainThread) [custom_components.mastertherm.coordinator] Unexpected error fetching mastertherm data: 504
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/api.py", line 221, in __get
    response_json = await response.json()
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/aiohttp_client.py", line 71, in json
    return await super().json(*args, loads=loads, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1166, in json
    raise ContentTypeError(
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: text/html', url=URL('https://mastertherm.online/api/v1/hp_data?moduleId=xxxxx&deviceId=1&application=android&messageId=2&lastUpdateTime=0&errorResponse=true&fullRange=true')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 318, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/mastertherm/coordinator.py", line 154, in _async_update_data
    refreshed = await self.mt_controller.refresh()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/controller.py", line 376, in refresh
    await self.__get_hp_updates(full_load=full_load)
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/controller.py", line 244, in __get_hp_updates
    device_data = await self.__api.get_device_data(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/api.py", line 457, in get_device_data
    response_json = await self.__get(
                    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/api.py", line 235, in __get
    raise MasterthermServerTimeoutError(
masterthermconnect.exceptions.MasterthermServerTimeoutError: 504
2024-03-11 09:47:39.891 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 60.828 seconds (success: False)

and

2024-03-11 09:59:09.873 ERROR (MainThread) [custom_components.mastertherm.coordinator] Unexpected error fetching mastertherm data: 504
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/api.py", line 221, in __get
    response_json = await response.json()
                    ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/aiohttp_client.py", line 71, in json
    return await super().json(*args, loads=loads, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 1166, in json
    raise ContentTypeError(
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: text/html', url=URL('https://mastertherm.online/api/v1/hp_data?moduleId=xxxxx&deviceId=1&application=android&messageId=2&lastUpdateTime=1710147424&errorResponse=true&fullRange=true')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 318, in _async_refresh
    self.data = await self._async_update_data()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/config/custom_components/mastertherm/coordinator.py", line 154, in _async_update_data
    refreshed = await self.mt_controller.refresh()
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/controller.py", line 376, in refresh
    await self.__get_hp_updates(full_load=full_load)
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/controller.py", line 244, in __get_hp_updates
    device_data = await self.__api.get_device_data(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/api.py", line 457, in get_device_data
    response_json = await self.__get(
                    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/masterthermconnect/api.py", line 235, in __get
    raise MasterthermServerTimeoutError(
masterthermconnect.exceptions.MasterthermServerTimeoutError: 504
2024-03-11 09:59:09.881 DEBUG (MainThread) [custom_components.mastertherm.coordinator] Finished fetching mastertherm data in 60.819 seconds (success: False)
SeBsZ commented 6 months ago

It looks like it's giving me a text/html page instead of your expected JSON. I can try to access the URL through postman and see what it gives me. Maybe it's rate control or something.

sHedC commented 6 months ago

Yes more likely the server is not responding in time.

SeBsZ commented 6 months ago

So it looks like the hp_data request sometimes takes a long time. I just ran the request using postman and it took 32s to return data. The next request takes just 57ms...

Actually, now it seems the heatpump is offline, as mastertherm.online now returns a data out-of-date error:

"error": {
        "errorId": 9,
        "errorMessage": "Data out-of-date (last update 2024-03-11 09:38:47 UTC)"
    },

I saw a Gateway timeout page once, but I didn't save it. That explains the json parsing error.

sHedC commented 6 months ago

is it worth just restarting your HP? Maybe its disconnecting from their API.

SeBsZ commented 6 months ago

I will turn off and on the heat pump, see if that helps.

SeBsZ commented 6 months ago

Power cycling the heat pump did not help. After it came back up, I also got a 504 gateway time out just now:

This one came after 41.22s:

<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

Retrying the request gives me correct data after 11.9s.

Is this something you want to handle better maybe? Perhaps some automatic retrying when a gateway timeout occurs, with exponential backoff? Maybe 3 retries max before failing?

sHedC commented 6 months ago

its not you its their service. Go to mastertherm.online and login with the demo account (demo/mt-demo) you see Error out of date.

SeBsZ commented 5 months ago

Oh even with their demo account? Let's go local access!

sHedC commented 5 months ago

Yeh, I think I need a client certificate to connect and no response as yet. there is a user/ password that the installer should have and probably a certificate but they will only give me that if Mastertherm say its ok.

They said they would check that was a couple of weeks ago.

SeBsZ commented 5 months ago

@sHedC Sent you an important message on Discord!

sHedC commented 5 months ago

@discodancerstu , @johny-mnemonic

We are working on local access using modbus which seems to be available in the MT servers, tracking the issue here and could do with support in analysing data as different HP's have different mappins.

https://github.com/sHedC/python-masterthermconnect/issues/113