home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.52k stars 1.47k forks source link

Mosquitto 5.1.1 is broken. #1887

Closed adamf663 closed 3 years ago

adamf663 commented 3 years ago

The problem

Environment

Problem-relevant configuration

Traceback/Error logs

Additional information

LifeBandit666 commented 3 years ago

I know OP has left zero information here and it will probably be closed out, but on a surface level I have to agree. Something has gone wrong with this point update.

I get my whole Home Assistant instance go down overnight. When I try to access it over the network I cant connect.

After a week of troubleshooting I have narrowed it down to my Mqtt server. What seems to be happening is it loses connection to HA and throws the whole thing off. HA is still running and inaccessible. I have to power cycle to get it up and running again and when I do, I get errors in HA logs about "timed out waiting for mid 2" for Mqtt, and errors in mosquitto about being unable to find HA.

I also find my Zigbee2mqtt devices are found at startup then immediately lost again (presumably when the time out occurs) and have to restart Zigbee2mqtt a couple of times to get it running again.

I wish i could prove that the HA hang is due to mosquitto but since it's inaccessible (but still running) I can't. What I can say is that in Recorder I can see the only devices that are unavailable are Zigbee2mqtt and this drops because of mosquitto (there's zero errors in the Zigbee2mqtt logs in debug mode).

I've restored an old backup of the previous version of Mosquitto to try and combat the issue. I'll try to remember to report back.

Edit: I think the same issue is #1817 and #1814

Also this one https://github.com/home-assistant/core/issues/45036

LifeBandit666 commented 3 years ago

Ok so after rolling back to 5.1 version of Mosquitto I no longer get the timeout errors on startup in the core log and Zigbee2mqtt seems to boot from the get go again (instead of requiring multiple reboots to find Mosquitto) Hopefully this will also fix the issue of my whole HA server becoming unresponsive periodically too.

This has been a week long hunt to try and find out why my system is dying which I was initially blaming on a new cable for my SSD. I've also tried looking at my Powerline system thinking it was cutting connection to HA.

hoopsta1423 commented 3 years ago

Yup, I updated last night and the add-on CPU usage spiked up to 150%. All my zwave2mqtt devices stopped working. Rolled back to 5.1 this morning and it's back to normal

grantalewis commented 3 years ago

Confirming same. Very strange, sluggish system recently with hundreds of lines similar to

2021-02-27 07:18:29 ERROR (MainThread) [homeassistant.components.mqtt] Timed out waiting for mid 98

in home-assistant.log. After rolling Mosquitto back to 5.1, no more errors in the log, performance back to normal.

peter-vanpoucke commented 3 years ago

I also needed to rollback to 5.1. I had the feeling Hassio couldn't log in any longer, but didn't put time in investigating. 😔

andersmoldin commented 3 years ago

I too seem to have this problem after installing Mosquitto 5.1.1. I did not make a snapshot, so I wonder if it is possible to install a specific version of Mosquitto, or in any other way downgrade to 5.1?

Edit: Attaching a screenshot. Connecting locally works fine, but connecting through my duckdns address suddenly doesn't work – after upgrading Mosquitto from 5.1. to 5.1.1 yesterday. Screenshot 2021-02-27 at 14 59 03

willysoft80 commented 3 years ago

Confirming same. Very strange, sluggish system recently with hundreds of lines similar to

2021-02-27 07:18:29 ERROR (MainThread) [homeassistant.components.mqtt] Timed out waiting for mid 98

in home-assistant.log. After rolling Mosquitto back to 5.1, no more errors in the log, performance back to normal.

I have the same problem and the home assistant has become slow to read the topics

realthk commented 3 years ago

I have the same problem and the home assistant has become slow to read the topics

Yes, that was what I've noticed first: most of my stuff is MQTT-based, and it suddenly took a few seconds to switch on a light through MQTT, controlled by a door- or motion-sensor, also connected through MQTT. I tried restart, then hundreds of "Timed out waiting for mid xxx" messages in the log.

Found this topic, rolled back MQTT to 5.1 and no messages any more, and light is switched on well under a second again.

Dinth commented 3 years ago

Yep, same here - none of the things using MQTT in my home works anymore - Openzwave, Ebus, Zigbee2MQTT, Valetudo - all the devices went "Unavailable". I have tried restarting every component but still cant get any of my devices to work. Strangle - MQTT addon log shows that addons are connected:

1614440933: Client zwave already connected, closing old connection.
1614440933: New bridge connected from 172.30.33.2 as zwave (p2, c1, k60, u'addons').
1614440933: Client blueIris already connected, closing old connection.
1614440933: New client connected from 10.10.1.14 as blueIris (p1, c1, k20, u'xxx').
1614440933: Client mqttjs_c5c88712 already connected, closing old connection.
1614440933: New client connected from 172.30.33.5 as mqttjs_c5c88712 (p2, c1, k60, u'xxx').
1614440933: Client ebusd_21.1_770 already connected, closing old connection.
1614440933: New client connected from 10.10.25.2 as ebusd_21.1_770 (p1, c1, k60, u'xxx').
1614440933: Client blueIris already connected, closing old connection.
1614440933: New client connected from 10.10.1.14 as blueIris (p1, c1, k20, u'xxx').

But the devices those components are responsible for are unavailable in HA. Also when im trying to connect to MQTT using an MQTT client, it doesnt connect anymore: 1614440958: Socket error on client auto-FBB4A2D6-C2EF-EC2F-C7C9-04379354796B, disconnecting.

JoJa1101 commented 3 years ago

Same here, MQTT not working well anymore. Getting a ton of [homeassistant.components.mqtt] Timed out waiting for mid messages

davidms12 commented 3 years ago

I have one instance running 5.1 and it's working fine. My main machine I upgraded to 5.1.1 - big mistake - all devices through mqtt are now showing as unavailable and tons of "timeouts waiting for mid..."

JoJa1101 commented 3 years ago

Exactly: looks like this since 5.1.1

image

G3rry71 commented 3 years ago

Same issue here! How do I force 5.1.0 installation from Supervisor? Thanks

adamf663 commented 3 years ago

Don't quote me as I'm guessing. I had used an old snapshot. I think a 'docker pull homeassistant/aarch64-addon-mosquitto:5.1' might work. The only ways I know to access the docker containers is through the portainer addon, or by ssh'ing to -p 22222 root@. I think enabling 22222 also requires an addon.

Dinth commented 3 years ago

I think that restoring a snapshot (it is possible to restore JUST the addon) is the only option

sir106 commented 3 years ago

after restart of home assistant and upgrading i have the same problem.

JoJa1101 commented 3 years ago

this is srsly annoying right now! I have to switch out lights manually and close shutters now ^^

maxlyth commented 3 years ago

Same symptoms here with 5.1.1 update

In the log I was getting a lot of 'Socket error on client DVES..' errors so I thought this was caused by my recent changes to inter-VLAN routing of Unfi-Wifi config.

Wasted too many hours on this before I saw this thread and reverted just the Mosquitto part of a recent snapshot which instantly fixed the problem.

ahknight commented 3 years ago

Exact same issue. Everything's broken with 5.1.1. Luckily had an old snapshot and I'll be trying to restore from that.

JoJa1101 commented 3 years ago

I did the same. Rolled back only Mosquitto to 5.1 and it works well again.

alex-savin commented 3 years ago

@JoJa1101 How did you roll back?

blacknacoustic commented 3 years ago

I did the same. Rolled back only Mosquitto to 5.1 and it works well again.

Hey can anyone help me roll back? How would i do that?

bsmeding commented 3 years ago

Same here only rollback from old snapshot restored the mqtt function

Would be a nice feature to get a version dropdown by the addons so a rollback would be easier for everyone

JoJa1101 commented 3 years ago

@JoJa1101 How did you roll back?

I made a snapshot on my HASS Testsystem (was still running on Mosquitto 5.1) and did the partial rollback on my Prod System.

alex-savin commented 3 years ago

The issue is that socat (which serves as auth point) is being continuously executed again and again!

/data # ps axu
PID   USER     TIME  COMMAND
    1 root      0:03 /sbin/docker-init -- /init /run.sh
    8 root      0:00 s6-svscan -t0 /var/run/s6/services
   36 root      0:00 foreground  if   /etc/s6/init/init-stage2-redirfd   foreground    if     if      s6-echo      -n      --      [s6-init] making user provided files available at /var/run/s6/etc...          foreground      backtick      -n      S6_RUNTIME_
   37 root      0:00 s6-supervise s6-fdholderd
   48 root      0:00 foreground  s6-setsid  -gq  --  with-contenv  backtick  -D  0  -n  S6_LOGGING   printcontenv   S6_LOGGING    importas  S6_LOGGING  S6_LOGGING  ifelse   s6-test   ${S6_LOGGING}   -eq   2     redirfd   -w   1   /var/run/s6/uncaught-logs-fi
  189 root      0:00 bash /usr/bin/bashio /run.sh
  246 root      6:14 socat TCP-LISTEN:8080,fork,reuseaddr SYSTEM:/bin/auth_srv.sh
  247 root     22:02 mosquitto -c /etc/mosquitto.conf
 5453 root      0:00 socat TCP-LISTEN:8080,fork,reuseaddr SYSTEM:/bin/auth_srv.sh
 5454 root      0:00 socat TCP-LISTEN:8080,fork,reuseaddr SYSTEM:/bin/auth_srv.sh
 5455 root      0:00 bash /usr/bin/bashio /bin/auth_srv.sh
 5459 root      0:00 ps axu
32379 root      0:00 /bin/sh
rpitera commented 3 years ago

Glad I found this; I was just about to install and try to migrate from CloudMQTT to a local instance. I only used CM because I needed an external server to use with an older mobile location method that I don't use anymore since the HA App came out. Guess I'll subscribe to this thread and wait it out.

Mariusthvdb commented 3 years ago

very much an issue here, especially since the mqtt binary_sensors keep triggering the automations on each reload (don't know how to call it but Mqtt gets re-triggered quite often now...)

do we have an add-on dev listening in on this?

poudenes commented 3 years ago

Have same issue... It seems that the speed of mqtt is also changed... I get those errors as well while restart HA. I see that my zwave2mqtt reaction time is changed with a short delay.

I don't know if this is because of the bug that's how in latest version

davidms12 commented 3 years ago

Any Devs looking into this? thx.

juslex commented 3 years ago

Same issue here

davidms12 commented 3 years ago

No snapshot here. Is there any way to use command line to downgrade to 5.1?

ha addons install ???????

thx.

blacknacoustic commented 3 years ago

This is my suggestion to anyone who is using Mosquitto inside of there HA instance. I would suggest using or setting up a MQTT broker outside of the HA and linking it inside HA manually using yaml and the ip address.

https://www.home-assistant.io/integrations/mqtt/

poudenes commented 3 years ago

This is my suggestion to anyone who is using Mosquitto inside of there HA instance. I would suggest using or setting up a MQTT broker outside of the HA and linking it inside HA manually using yaml and the ip address.

https://www.home-assistant.io/integrations/mqtt/

3 Days I changed everything from inside to outside. Using a RPi3 for a MQTT Broker (Moquitto) to much power, can move to RPi Zero I guess. And everything is blasting fast again!!!! Even faster then before the brick!

adamf663 commented 3 years ago

Enough is enough. Either take 5.1.1 down or fix it! At the very least provide a back out or workaround method that doesn't require going back to a snapshot.

----- Original Message -----

From: "poudenes" notifications@github.com To: "home-assistant/addons" addons@noreply.github.com Cc: "adamf663" adam.github@thefelsons.us, "Author" author@noreply.github.com Sent: Saturday, March 6, 2021 7:50:31 AM Subject: Re: [home-assistant/addons] Mosquitto 5.1.1 is broken. (#1887)

This is my suggestion to anyone who is using Mosquitto inside of there HA instance. I would suggest using or setting up a MQTT broker outside of the HA and linking it inside HA manually using yaml and the ip address.

https://www.home-assistant.io/integrations/mqtt/

3 Days I changed everything from inside to outside. Using a RPi3 for a MQTT Broker (Moquitto) to much power, can move to RPi Zero I guess. And everything is blasting fast again!!!! Even faster then before the brick!

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/home-assistant/addons/issues/1887#issuecomment-791965802

nickrout commented 3 years ago

What an angry demanding fellow you turned out to be. First you post an error report without logs or anything useful. Then you post a demanding grumpy post.

FWIW 5.1.1 is working fine here, amd64 and home assistant supervised.

davidms12 commented 3 years ago

nickrout you're lucky. It is not working under the arm/Pi installs...

ahknight commented 3 years ago

I'm on Proxmox on a Skylake NUC and had the same issue. It isn't the microarchitecture.

Dinth commented 3 years ago

Yep, I'm on R720/ESXi, definitely not related to platform

jsb5151 commented 3 years ago

Same issue here; 5.1.1 is very unresponsive. Accepts a connection from MQTT Explorer but takes a loooong while to display the topics & values. HA Supervised on a x86 compute module. Never had any issues before 5.1.1.

Referenced #1897 too - seems like the same issue...

michaelarnauts commented 3 years ago

Same issue here; 5.1.1 is very unresponsive. Accepts a connection from MQTT Explorer but takes a loooong while to display the topics & values. HA Supervised on a x86 compute module. Never had any issues before 5.1.1.

Referenced #1897 too - seems like the same issue...

I'm not really sure that #1897 is the same issue. Besides the memory leak, I don't get any timeouts.

One of the changes between 5.1 and 5.1.1 is this: https://github.com/home-assistant/addons/commit/d291f564ced1c19d0cfaa349b9f3b4b91e615151 So it is probably related to that, although I don't know what bashio does.

michaelarnauts commented 3 years ago

What do you see when you subscribe to the topic $SYS/broker/messages/#? You can do this from the MQTT integration page, or through the cli with mosquitto_sub:

mosquitto_sub -v -t '$SYS/broker/messages/#' -u <username> -P <password>`

I'm seeing stuff like this:

$SYS/broker/messages/stored 4974794
$SYS/broker/messages/received 5020091
$SYS/broker/messages/sent 45801

And I suspect that I shouldn't be seeing a high value for $SYS/broker/messages/stored. This might explain my memory leak, but maybe it can also explain some of the other issues described here.

nickrout commented 3 years ago

I am getting 205 stored against roughly 2,000,000 sent and received. This is with 17 MQTT devices with 102 entities.

On Wed, Mar 10, 2021 at 5:23 AM Michaël Arnauts @.***> wrote:

What do you see when you subscribe to the topic $SYS/broker/messages/#? You can do this from the MQTT integration page, or through the cli with mosquitto_sub:

mosquitto_sub -v -t '$SYS/broker/messages/#' -u -P `

I'm seeing stuff like this:

$SYS/broker/messages/stored 4974794 $SYS/broker/messages/received 5020091 $SYS/broker/messages/sent 45801

And I suspect that I shouldn't be seeing a high value for $SYS/broker/messages/stored. This might explain my memory leak, but maybe it can also explain some of the other issues described here.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/home-assistant/addons/issues/1887#issuecomment-794116037, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABG6PX4GMIUHYLZIWZ7A4TTTCZDPTANCNFSM4YJHM6YA .

nickrout commented 3 years ago

although I don't know what bashio does.

Bashio is a bash function library for use with Home Assistant add-ons. It contains a set of commonly used operations and can be used to be included in add-ons to reduce code duplication across add-ons and therefore making it easier to develop and maintain add-ons.

ozbob01 commented 3 years ago

I also see the same issues, initially I had both zwave and zigbee on mqtt using the respective add-ons. It got really bad to a stage where nothing worked, mosquitto taking 99% cpu all the time. After a lot of struggling I managed to have 5.1.1 working with only zigbee2mqtt but I still see many errors accumulating just not enough to break the system.

No ACK from MQTT server in 10 seconds (mid: 7) 2:04:42 AM – MQTT (WARNING) - message first occurred at 2:04:38 AM and shows up 11 times Setup of switch platform mqtt is taking over 10 seconds. 2:04:33 AM – Switch (WARNING)

1614953178: Socket error on client , disconnecting. 1614953242: New connection from 172.30.32.1 on port 1883. 1614953242: Socket error on client , disconnecting. 1614953362: New connection from 172.30.32.1 on port 1883. 1614953362: Socket error on client , disconnecting. 1614953482: New connection from 172.30.32.1 on port 1883.

As said the reconnecting goes on and eventually seems to kill the broker at it does not allow connections to work any longer. First I ran of IOTstack, to avoid any influence I am now also running on hass OS on a 4gb RPi

alex-savin commented 3 years ago

As one person mentioned, enough is enough! Agreed! Decided to stop using all the critical addons with "magic" from HA dev team. It is all the time gambling will it work after any update. Setup a cluster of three VerneMQ instances with HAProxy as a load balancer.

RichieFrame commented 3 years ago

@michaelarnauts

One of the changes between 5.1 and 5.1.1 is this: d291f56 So it is probably related to that, although I don't know what bashio does.

I did an analysis of the docker hub images for 5.1 and 5.1.1 and found some interesting things, posted the details in the HA forum thread about this

https://community.home-assistant.io/t/mosquitto-5-1-1-is-broken/286979/10

michaelarnauts commented 3 years ago

I'm afraid my issue is solved. I've removed the /data/mosquitto.db file, and it stopped saving message to memory. I assume there was once a client that had connected with the durable flag, and it subscribed to a heavy traffic topic. I think mosquito will keep all messages in case that client reconnects.

It might be worthwhile to check if this can fix this issue in your case...

chris-ka1 commented 3 years ago

This does not solve the mentioned issues. I just checked this by deleting mosquitto.db in the container and then made the update to 5.1.1. MQTT broke instantly in HA. devices were not available.

aes-alienrip commented 3 years ago

For those who want to rollback 5.1:

Eeeeeediot commented 3 years ago

For those who want to rollback 5.1:

  • Backup and uninstall mosquitto 5.1.1
  • Fork mosquitto repository, edit "version": "5.1.1" to "version": "5.1." in config.json
  • Add this custom repository in the supervisor's add-on store and install I installed mosquitto 5.1 newly with this method, hope this help

hey i tried this but it didnt allow me to add the fork as a custom repo