jeatheak / Mitsubishi-WF-RAC-Integration

WF-RAC homeassistant integration
MIT License
137 stars 26 forks source link

Units becoming unavailble #106

Open Pel1can111 opened 4 months ago

Pel1can111 commented 4 months ago

I am currently unsure if this is a problem with my setup or a bug in HA or this plugin but I set up an automation to alert me to any changes in the error code reported by the indoor units. I'm not sure if it started happening after the latest update with this plugin or the latest version of HA but I am randomly getting notifications due to the units becoming unavailable. Looking at the logs they seem to just briefly become unavailable then return to normal. This is something that would not be noticable if it were not for the notifications I had set up.

Please could other users just check their logbook and see if their units are doing the same so I know if this is a problem with HA or something with my units?

Thanks

drawgas commented 4 months ago

I can confirm, after latest update - units started frequently becomming unavailable. Restarting HA not always helps

jeatheak commented 4 months ago

Hi,

This is indeed because of the latest update. There were some feature requests about the availability of the device when you power it off. See #98.

I implemented this change. But this also means that if a call from the home assistant fails the AC unit, it becomes unavailable. That a call fails is a known issue and cannot be fixed at the moment unfortunately.

So you can change your automation to only report after X tries or I will implement a disable or X count before the unit becomes unavailable. I'm not sure yet.

Sorry for the inconvenience but I think the ability to make the unit unavailable after some time should be there.

Pel1can111 commented 4 months ago

Thanks for the update. Aside from the notifications its not been causing any trouble operating the units so I will modify the automation to prevent false notifications, still nice to know the cause.

Thanks :)

drawgas commented 4 months ago

Thanks for the update. Yes, it would be nice to have the option "unavailable after x retries" as currently it's a bit too frequent :)

290989 commented 4 months ago

I'm having this problem quite often at the moment. Sometimes the device remains unavailable. I've tried turning off the fuse, but that often didn't help either.

Is it possible to find another solution? I have many automations that react to the state of room temperature etc. and thus send changes to the air conditioning. Now, since the update, my device is sometimes unavailable 5-6 times an hour. Therefore, my regulations do not work reliably because they trigger at certain times (half an hour, full hour, etc.).

I would be very happy about an idea or solution.

Is the version before 2024.6 the 2024.2.1? Can I install the older version again?

IMG_0345

Now, since half an hour one of the devices is unavailable again.

290989 commented 4 months ago

@jeatheak An indoor unit was not available again for two hours today. I then deleted it and wanted to add it again. I then got the following error:

IMG_0355

After I turned the fuse off and on again, it worked again. Does this help to solve the problem?

jeatheak commented 4 months ago

Hi,

As far as I can see this is not a problem I can solve. The unit is really unavailable at that moment. If max retries have been exceeded it means that it cannot connect to the unit (get no response).

Did you try the smart-m air app at the same time?

The only thing I can try to add is an X count before putting it unavailable. But I think the unit is still unavailable and won't respond to your commands.

Before the update this was also happening but it was never visible in homeassistant.

Do you have more instances (homeassistant, apps) polling the unit? This can also cause some connectivity issues. The wifi module really does not like multiple connections as far as I know.

Edit: I maybe can add an option to disable the availability option.

290989 commented 3 months ago

Edit: I maybe can add an option to disable the availability option.

That would be nice. Thank you πŸ‘ŒπŸ»

I have only the Smart M-Air App & Home Assistant. Could it be, that the many pings if it’s available is to much & the unit is unavailable then?

Rorrik404 commented 3 months ago

As far as I can see this is not a problem I can solve. The unit is really unavailable at that moment. If max retries have been exceeded it means that it cannot connect to the unit (get no response).

Edit: I maybe can add an option to disable the availability option.

Agreed the unit really is offline at those points. I've tested mine via a post request and get a failure to establish connection. I've found that putting a DNS block on iot.smartmair.com has decreased the timeouts. Ultimately I think it's badly coded firmware on the AC side.

Adding an option to disable the availability check would be very useful. Likewise maybe adding an extending delay (to a point) for rechecks. E.g. GetStatus -> Fail -> Try again -> Fail -> Wait 30 seconds -> Fail -> Wait 1 Minute... Cap at checking every 5 minutes until its back?

290989 commented 3 months ago

Do you have more instances (homeassistant, apps) polling the unit? This can also cause some connectivity issues. The wifi module really does not like multiple connections as far as I know.

Do you know what are the 2 Accounts?

IMG_0453

could this be multiple connections?

jeatheak commented 3 months ago

Do you know what are the 2 Accounts?

Yes this are the number of registered devices (apps) with the airco unit. As far as I now you can have a maximum of 4 deviecs registeres (so for example: 3 smart-m air apps (different phones) and 1 homeassistant connection)

If it has reach 4 than you need to remove one.

could this be multiple connections?

In theory yes but only when the smart-m air app is activly open on the phone. It even sometimes gives a message that another instance is connected. But this will be resolved after some time. So this should not be a problem.

jeatheak commented 3 months ago

@Rorrik404

Agreed the unit really is offline at those points. I've tested mine via a post request and get a failure to establish connection. I've found that putting a DNS block on iot.smartmair.com has decreased the timeouts. Ultimately I think it's badly coded firmware on the AC side.

Yeah indeed I think it is a weirdly designed firmware. I have blocked the complete internet access of the AC's and still have it somethimes.

Adding an option to disable the availability check would be very useful. Likewise maybe adding an extending delay (to a point) for rechecks. E.g. GetStatus -> Fail -> Try again -> Fail -> Wait 30 seconds -> Fail -> Wait 1 Minute... Cap at checking every 5 minutes until its back?

This is a good suggestion. I will look into it.

Rorrik404 commented 3 months ago

@jeatheak Adding an extending delay getAirconStat check would likely solve 95% of the issues of the device becoming unavailable in HA. There's nothing you can do about it when the device is actually playing up (as discussed above).

That said a possibly "quick and dirty" fix would be add a "Force Recheck" button to the device page that calls the async_update function.

I found when testing that the device can become available again (constant posts going through without timing out) however HA doesn't update for quite a while. The only quick solution to get it available again is to restart HA so it does a getAirconStat when HA comes back up. Adding a "Force Recheck" button wouldn't solve the issue of it going offline, but would provide a very quick way to run an "has it started behaving again" check and avoid restarting HA.

jeatheak commented 3 months ago

It takes a bit more time and testing. Sorry for the delay. Have a very busy period at the moment. You can follow the progress at branch 2024.7

rogermeijer commented 3 months ago

I think I am facing similar issues with one unit in particular that becomes unavailable and stays like that. While waiting on 2024.7 (although I am already running 2024.7-beta.2), is there a way to get the unit available again? Restarting HA, redownloading integration, disabling/enabling device, nothing works. Any tips? Thanks a lot!

Pel1can111 commented 3 months ago

I think I am facing similar issues with one unit in particular that becomes unavailable and stays like that. While waiting on 2024.7 (although I am already running 2024.7-beta.2), is there a way to get the unit available again? Restarting HA, redownloading integration, disabling/enabling device, nothing works. Any tips? Thanks a lot!

This sounds like an actual problem with the unit, perhaps poor wifi signal in that area? I would suggest trying to access the unit through the smart m-air app and see if its still available there when not available in HA. If its not working via the app then it points towards an issue with the unit and not HA.

2112b commented 3 months ago

Roger, if you mean by unavailable that the unit does no longer responds in HA... This happend here after operating the unit with the remote. (the tempgauge changed colour to green I think)

Eventually, switching the unit off with the Remote returned the control in HA (after a short while)

rogermeijer commented 3 months ago

Thanks for the quick responses, your comments made me look (again) into the given IP addresses and for some reason the IP address of this specific device changed. After reconfiguring the IP address and restarting HA it is available again 🫣

The previous time it went "unavailable", while still working in the Smart M-Air app, it was not an IP issue. It did go back online after redownloading the integrations.

The second time it stayed unavailable, so I skipped the IP check. Sorry for the confusion! Now I need to find a way to fix the IP-address to the device with my current set-up 🧐

perplexityjeff commented 3 months ago

Hi,

I have noticed it mostly with 1 unit that in HA it is seen as 'unavailable' however using Smart M its perfectly controllable which seems weird (that unit is the most used one for sure). Also I see no drop in wifi connections on the router.

I just installed the integration as I just got the airco's but it seems that the checks are a bit to aggresive for availability?

rfx77 commented 2 months ago

Same problem here.

The Unit is working in the M-Air App without any issues but it refuses the local connection. I tried with curl

curl http://172.27.102.103:51443
curl: (7) Failed to connect to 172.27.102.103 port 51443: Connection refused

Th IP is correct The problem seems to be that if you do too many connection to the unit via port 51433 it stops responding for some time. After a while it starts responding again.

Maybe your aviability check is to agressive. Is it possible to stop the availability check or to set the rate?

rfx77 commented 2 months ago

According to my logs all my 4 WF-RAC reconnect every hour to my Wifi. It can happen that after that they are not accessible over http until the next reconnect after one hour. Ping always works. Smart M-Air App always works.

Any clues

dom404 commented 1 month ago

I see this on all 7 of my units

1 or 2 will stop responding in HA; however, the Smart M-Air App always works even when the port appears to be down.

neworld commented 1 month ago

Sorry if this is the wrong thread to continue. I recently encountered the same problem. The unit had been working perfectly for 1.5 years with no single failure. However, it suddenly became unavailable in HA, and restarting HA and A/C did not help.

I banged my head against the wall for two days until I tried to launch automation, which I had disabled a few days ago, and bam, it works again.

So, status updates work only if the same device requests a set state.

It is not enough to set the state via mobile. I am considering making an automation to set the state each day. It would be nice to find a way to set same state

dom404 commented 1 month ago

Sorry if this is the wrong thread to continue. I recently encountered the same problem. The unit had been working perfectly for 1.5 years with no single failure. However, it suddenly became unavailable in HA, and restarting HA and A/C did not help.

I banged my head against the wall for two days until I tried to launch automation, which I had disabled a few days ago, and bam, it works again.

So, status updates work only if the same device requests a set state.

It is not enough to set the state via mobile. I am considering making an automation to set the state each day. It would be nice to find a way to set same state

How can you set the state if the device is unresponsive in HA? Does HA not complain the device is off?

neworld commented 1 month ago

How can you set the state if the device is unresponsive in HA? Does HA not complain the device is off?

I have automation to enable the air conditioner based on time each day, but I disable it during cold weather because I don't need it.

I will do extra tests on this

Pel1can111 commented 1 month ago

I have been running the latest beta for a few days now with notifications re-enabled and I’m yet to get one error so from my standpoint this issue can be closed. Thanks

rfx77 commented 1 month ago

I also had this issues but i could solve them. As i said above the units become unavailable sometimes when the do a dhcp renew.

I did the following things: 1) disabled cloud connection in the Mitsubishi app and switched to local connect 2) denied any internet access for the HVAC devices 3) kicked the devices out of wifi every 30min (automated with my ubiquiti)

I think 1 and 2 did the trick. 3 seems to be optional

Greetings, Franz

dom404 commented 1 month ago
  • disabled cloud connection in the Mitsubishi app and switched to local connect
  • denied any internet access for the HVAC devices

I suspect most people are using the Mitsubishi's that don't have a Cloud connection, so no internet requirement.

Booting the device off the wifi will make no difference.

Why don't you fix the IP of the device? seems like (in your case) the easier option?

The beta seems to working well

rfx77 commented 1 month ago

The devices have static dhcp assigned but they do renew anyway every hour no matter how long you set the lease. So this is in their Firmware as it seems. If you confgure the WF-RAC for local access (App) they are connecting to the internet also. I had to block them.

dom404 commented 1 month ago

The devices have static dhcp assigned but they do renew anyway every hour no matter how long you set the lease. So this is in their Firmware as it seems. If you confgure the WF-RAC for local access (App) they are connecting to the internet also. I had to block them.

Different devices connect and perform in different ways.

M-Air is the app quite a few will use - No internet access or cloud function/access. Yours do connect, but many do not. Mine certainly do not renew their lease every hour.

rfx77 commented 1 month ago

The devices have static dhcp assigned but they do renew anyway every hour no matter how long you set the lease. So this is in their Firmware as it seems. If you confgure the WF-RAC for local access (App) they are connecting to the internet also. I had to block them.

Different devices connect and perform in different ways.

M-Air is the app quite a few will use - No internet access or cloud function/access. Yours do connect, but many do not. Mine certainly do not renew their lease every hour.

Do you have connectivity problems now or did you solve them?

dom404 commented 1 month ago

Trialling the latest beta. To soon to say

rfx77 commented 1 month ago

Maybe you could check on your Firewall if the HVAC units really dont try to connect to the internet. In my situation they always did and when i denied them the problems went away.

dom404 commented 1 month ago

I can see that even with 2024.7 I have got one of my units being unavailable at the moment.

51443/tcp closed unknown

M-Air app works as expected

jeatheak commented 1 month ago

Please don't create a new issue for the same problem. @dom404 .

It's getting a bit frustrating for me at the moment because I have 2 units at home and they never disconnected in 2 weeks or even more. So it is really difficult to debug/fix this.

You sure it is not your network? The big difference between the smart m air app and home assistant is, that the app only connects when you open the airco in the app. And home assistant polls (all) the airco(s) constantly.

I even put 1 airco on a remote socket I can turn on/off. it now always reconnects nicely, even after being offline for 1 day.

So to conclude: I will try to look into it (again) but it will take time and for now I don't know if I can find a solution for you. So sorry for that.

(My airco's are blocked from internet access and are in an isolated 2.4ghz network)

Edit: ok I need to change my comment a bit, I see in the logging that the airco cannot update the values around 8 times in 1 day. But because the 3 times retry is implemented it does not set it to unavailable for me. Did you try to set the retry limit higher?

dom404 commented 1 month ago

I am happy to do any testing you require/need. But it's catching the units when they are offline/unavailable. I usually only see if HA reports that it was unavailable in the past while looking at the log book.

I have 7 units used for main house heating so I might notice more than most.

I will increase the retry limit from 3 - any recommendations

I can see it was reported as unavailable for 20 minutes during this time it was available and responding to M-Air (I believe there is a 60-second timeout between devices talking to the units)

I don't see my units talk to the internet at all. I did notice that there is an option for remote access, which I don't use (Set to local), so this might be why.

Sorry to be the part cause of any frustration :-) . I appreciate the work that you have put into this and continue to do so.

Pel1can111 commented 1 month ago

I really struggle to see there are any issues with the plugin at this point causing this. Prior to the update (with the aircon's on my list of devices I get notified if they go offline) I was getting between 2-6 notifications every day that the aircon units were going offline.

Since I installed the update (little over 2 weeks I think) I have received 1 notification and that was caused by me turning the breaker off for that circuit to install a new plug. I have 3 internal units so its a pretty good test and I have the retry limit set to 3 so I have no doubt the units are not responding a bit like before but this is gettings ironed out by the retries.

I just looked and my app is set to remote access (not that I use it). I have not bothered to block their internet access, just have them on a seperate vlan other devices.

Thing I would suggest you try.

Turn the power off to the airon, leave it off a few minutes and turn it back on and see if that helps. I have in the past seen a unit stop responding for a long period of time and this was the easiest fix.

If that does not help, I would try and place a wifi access point next to the unit and see if that rules out a signal issue.

dom404 commented 1 month ago

My units are a multi. I would need to turn the main breaker off outside to all the units to turn off one.

I cannot rule out a wireless issue and it was the first thing I tested. However, the M-Air App always works over the wireless. even when the port is showing as closed. Assuming that the M-Air App and the Plugin are using the same port for communication.

I thought it was the service stopping/crashing but the App rules that out.

Pel1can111 commented 1 month ago

might also be worth checking they are on the latest firmware. I think 131 is the latest

dom404 commented 1 month ago

I have just checked.

All units are running 131 - In 24 hours this is the only reported unavailable time across all 7 units.

My units are currently set up for winter. They will turn on for an hour if the temperature of the room drops below x plus a schedule for when people are home. I will watch and see if this pops up again.

I should note that this is not a common trait and Is sporadically annoying when caught. I am tempted to proxy the Mair app and see why it works, but nothing else does.

dom404 commented 1 month ago

So, I have been logging the devices on the wifi.

I can see that the AirCon Units disconnect from the wifi once an hour every hour, they are the only devices that do. They instantly reconnect and then are good to go another hour. It's very strange. They also are not all at the same time

i.e. one unit will disconnect and reconnect at 12:30, 13:30, 14:30, etc another will be at 12:16, 13:16 etc all 7 units do this.

I don't have any evidence to confirm this is a contributing factor for the unavailable status. But I will continue to monitor.

I also increased the retry and I get 1 device unavailable per day at this point - I am trying to catch one when it's unavailable and not historic.

They are also the only WiFi 3 devices I own.

Pel1can111 commented 1 month ago

I'm almost certain this is the reason for the errors. Before the updated version if the integration tried to talk to the AC while it was reconnecting to WiFi it would time out and fail, the new version with retries "solves" this.

The issue with the hourly disconnects appears to be a issue with the firmware/hardware and I would suggest everyone bugs the manufacturer to solve this.

Other users with the issue are moaning about it here as well - https://community.home-assistant.io/t/mitsubishi-wifi-module-wf-rac-smart-m-air/411025/192

The firmware update a while back (131) seems to have been released to solve this and some users report it did for a while but the issue returned. All my units reconnect every hour looking at my unifi dashboard.

dom404 commented 1 month ago

I am sure you are correct but I cannot get the times to match. I can see that HA reported the device as unavailable at 9:53:56 AM, but the reconnect time was 9:44 AM, 10:44 AM etc.

It's possible what I am now seeing is the delay between the two (I have the retry set to 10)

What I need to try is scanning the port. When the plugin reports the unit as Unavailable the port is always closed. It is like the service is restarting. I don't know. Just because the unit is connected, if the port is up.

At least I know when the device is going to reconnect.

I still need to check what the app is doing so it is able to work but nothing else can.

rfx77 commented 1 month ago

As i said above i had the same issues and the same behavour and this steps solved it 100%

https://github.com/jeatheak/Mitsubishi-WF-RAC-Integration/issues/106#issuecomment-2393023520

I have a new multisplit with 4 indoor units and wifi built in. i am on the newest firmware but this did not solve the issues.

i cannot speak for other unit types but mine always tried to connect to the mitsubishi cloud server no matter if they where in local or cloud mode. after making sure they are not able to do this anymore the problems went away.

i also monitor the units with a recurring curl call to be sure they dont go away.

dom404 commented 1 month ago

@rfx77

I am similar except I have 7 units over 2 Multi Splits - What I am seeing is at most, 1 unit dropping per day for 20-60 minutes (according to HA)

I think I have two questions that I need to answer. How does the App work when the plugin cannot? When does the port close?

Mine don't talk to the internet - But that's more my setup, if yours does, or try, even in local mode.

I would also like to say that this is not causing me an issue - I have not had a unit (since increasing the retry) refuse to switch on or off when it was supposed to according to the schedule I set. This is more a case that it could cause an issue and I would like to resolve it fully.

Do you see the units reconnect on their own outside of your 30-minute kick - Do you still do this even after blocking internet access for them?

EDIT: I have moved a couple of mine on the network into a monitor VLAN so I can view all the traffic in and out, the Internet is still blocked.

scooter306 commented 4 weeks ago

Have the Same issue with 2 devices. Blocking internet access via UniFi won't help, in local mode they are even worse available, so switched back to remote. As long as I don't add them to HA via this addon they are reachable in Smartair-App. The trouble starts when I activate it in Homeassistant.

What I have noticed so far while resetting WiFi often to get rid of the problem: devices seem to have internal protection of getting polled to often. Maybe you could create a beta with an option to poll data just every 10 minutes or make it selectable how often data should be polled?

Maybe the device has something like ddos-protection with newer firmware and closes connection every X polls within y minutes. Could be worth a try, if you say it's not too much work to implement a poll limiter/timer?

Thanks in advance. I appreciate your work, love the plugin, used it last winter to keep the house warm and in the summer to keep it cool. Now it's a hard to get a reliable warming as the devices are always not reachable and has to be switched on and off by remote.

Thanks for your hard work! @jeatheak

rfx77 commented 3 weeks ago

Hi!

I am executing this command on all of my 4 devices: curl -s http://[IP of device]:51443 every second so that i can check te availability. When i get Not supported this command than everything is fine. I had no issues with stability when i execute this many requests. I also changed the code of this integration to test various intervals but had no success.

Maybe you can execute this command periodically for some days to get a picture of how often the devices become unavailable and for how long they stay unavailabe. my devices stayed unconnected until they reconnected to the wifi after one hour.

rfx77 commented 3 weeks ago

@dom404

before i had my solution where i kick the devices periodically they reconnected almost exactly every hour. i watched the process and it didnt matter if they where cloud or locally connected.

Every hour: 1) Network drops and Devices tries DHCP-Renew and Wifi reconnect (sometimes in unter 1s) 2) Network tries to reach some cloud-addresses 3) Device is available at port 51433 (or not) 4) 60 min wait 5) restart at 1

If you dont have connectivity after point 3 you will not get it back for 60min normally.

In my case i am not sure if the local connection with the m-air app ist working when the devices are not available on port 51443 as the tcp-port is closed. Cloud always works if i enable it and if i allow them internet connection. It is also not a timeout issue. i tried a lot of settings with my curl monitoring. if the are not reachable that stays that way until they reconnect themselve or are forced to do so.

There seems to be a big bug in the FW of those devices.

scooter306 commented 3 weeks ago

For me the devices are always available in the app now since I deactivated the HA integration. As soon as I restart it I get the ACs to be disconnected soon and be unreliable. Maybe there are different bugs, concerning different kind of AC generations?

dom404 commented 3 weeks ago

@rfx77

I have played with various settings and configurations over the last week or so.

In local mode in the M-Air App I see only 1 device out of 7 has any drop for 1 hour in any 36-hour period reported in HA. Usually the same unit I see them reach out to the internet at https://iot.smartmair.com/ every hour so I allow-listed the address from my filtering and allowed them access to the internet to that domain.

I see, as you:

client disconnects and reconnects and reaches out to https://iot.smartmair.com/every hour

The times for the client disconnect and reconnect do not coincide with the times that HA has them as being off-line

They will always come back when they next disconnect/reconnect. This is typically 50 minutes.

I am at one unit down, in HA for 50 minutes every other day.

I do believe this is an issue in the device and not an environmental issue. I can, however, make it an environmental issue by impacting the access to the URL iot.smartmair.com or allowing roaming of the device on the wifi.

This is rare enough not to be an issue, although I can see that it could be as it starts to get colder.

I am going to test remote access for a week and see if there is any difference in the data. After that, I will look at what the Smart M app is doing to bypass the issue.