home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.56k stars 1.51k forks source link

Mosquitto 5.1.1 is broken. #1887

Closed adamf663 closed 3 years ago

adamf663 commented 3 years ago

The problem

Environment

Problem-relevant configuration

Traceback/Error logs

Additional information

Mariusthvdb commented 3 years ago

I'm afraid my issue is solved. I've removed the /data/mosquitto.db file, and it stopped saving message to memory. I assume there was once a client that had connected with the durable flag, and it subscribed to a heavy traffic topic. I think mosquito will keep all messages in case that client reconnects.

glad you got it working, but this can't be a solution for those of us on Home Assistant OS unfortunately. Hope the update will get fixed anytime soon, did we see some response lately from the dev team? Just so we know it is under progress?

nickrout commented 3 years ago

did we see some response lately from the dev team?

The thread isn't that long!

How did removing mosquitto.db work for you?

Mariusthvdb commented 3 years ago

that's just the thing, Home Assistant OS users cant delete a mosquito.db, because it isn't even there ? that is to say, in the samba reachable folders

nickrout commented 3 years ago

I haven't checked via samba and probably can't until I am on my LAN at home.

Mariusthvdb commented 3 years ago

cool if you would, it is a most unpleasant state fo affairs we're in right now. Needing the update apparently (why else would it be available) , but not being able to...

michalk-k commented 3 years ago

Guys it's 2 weeks since the issue has been identified and confirmed that v5.1 is working properly why v5.1.1 hasn't been rolled back yet? I assume more and more people are being affected by this issue.

realthk commented 3 years ago

I assume more and more people are being affected by this issue.

Probably still not enough people. I suspect this error occurs only under some conditions not in every setup, otherwise it'd be investigated by now.

chris-ka1 commented 3 years ago

I think I have a path to overcome the problem. The comment from realthk got me thinking and I used the following procedure to get 5.1.1 working for me.

  1. Take a snapshot (complete) from your current home assistant instance
  2. go to the mosquitto add-on page and copy your configuration into an text editor for later use.
  3. deinstall the mosquitto add-on
  4. install mosquitto 5.1.1 and copy the configuration back from the editor
  5. start the add-on

This solved the problem in my installation

Mariusthvdb commented 3 years ago

that could well be, but, before even trying this, why not wait for the dev's to solve the issue? Or, put another way, why update in the first place at all, what do we gain by doing so, does anyone have an idea of the new features/functionality/advantages of 5.1.1 over 5.1.0 ?

I mean, missing out on this seems hardly a reason for immediate panic:

Schermafbeelding 2021-03-18 om 22 25 42

of course ymmv, but I wouldn't even know exactly what this does... or if I need it, guess not ;-)

chris-ka1 commented 3 years ago

Agreed. But I suspect that following releases will carry on the problem unless you cleanup your installation

quarcko commented 3 years ago

For me - reinstalling the addon does not helped. I mean like initially, yes, there are no warnings, but after restarting zigbee2mqtt and zwave addons so they both re-add autodiscovery topics - you are back to same situation as before.

chris-ka1 commented 3 years ago

thats interesting. I had the exact situation with zwaveJS2mqtt and it is now running for over 12h as with 5.1

quarcko commented 3 years ago

I must add that i still use old OpenZWave addon as i had no time to migrate yet. So maybe that's the problem? Did you used OpenZWave in the past? Maybe for you it helped because basically reinstall removed OpenZWave topics that are problematic? and for you they did not return, as you use now newer zwaveJS2mqtt and for me problem is back because OpenZWave is still there...

just assumming...

chris-ka1 commented 3 years ago

I used zwave2mqtt before and did the migration to zwaveJS2mqtt 4 weeks ago. So your assumption might be correct.

chumbazoid commented 3 years ago

FWIW, I've also experienced this with a Debian 10 x64 (supervisor supported) installation. Have never used Z-Wave / Zigbee and broker delete+reinstall did not resolve. Also tried deleting and reinstalling the MQTT integration.

With reinstalls, HA either could not connect to the broker at all (per HA log) or repeatedly connected+disconnected due to "socket error" (per broker log). Reversion to a 5.1 snapshot always restored normal functionality.

I'm abandoning the addon broker and followed this guide to install MQTT directly on Debian. It was painless and quick with no functionality loss; only needed to rename some entities on my dashboard for cosmetic purposes. I suppose this setup is no longer supervisor compliant but the new broker seems snappier than ever.

Edit: neglected to mention--to get the new broker working, also had to "publish cmnd/tasmotas/so19 1" from one of my MQTT (Tasmota) device's web consoles to re-initiate HA discovery (might've happened on its own if I'd waited, not sure)

nickrout commented 3 years ago

@chumbazoid I believe you are the only person on debian+supervised to report being affected by this. (Although most people unhelpfully don't say)

alex-savin commented 3 years ago

@nickrout he is not the only one!

nickrout commented 3 years ago

What?

realthk commented 3 years ago

Probably the "only" is missing, as @chumbazoid is certainly not the only one: I'm also using HA Supervised on Debian 10, but have no time to experiment with removing-reinstalling MQTT (and also use a few topics to store information with retained messages), I'm fine with 5.1 for now.

jsb5151 commented 3 years ago

My system is also a supervised deployment.

On Mar 19, 2021, at 5:38 PM, Henrik Tóth @.***> wrote:

 Probably the "only" is missing, as @chumbazoid is certainly not the only one: I'm also using HA Supervised on Debian 10, but have no time to experiment with removing-reinstalling MQTT (and also use a few topics to store information with retained messages), I'm fine with 5.1 for now.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

jysaloma commented 3 years ago

Having this issue as well. High CPU usage

kuva

and logs gets this [08:08:44] INFO: [INFO] found tasmota3 on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe 1616479725: Socket error on client , disconnecting. [08:08:59] INFO: [INFO] found tasmota3 on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe [08:09:09] INFO: [INFO] found tasmota3 on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe [08:09:24] INFO: [INFO] found tasmota3 on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe 1616479765: Socket error on client , disconnecting. [08:09:35] INFO: [INFO] found zigbee2mqtt on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe [08:09:42] INFO: [INFO] found zigbee2mqtt on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe [08:09:56] INFO: [INFO] found zigbee2mqtt on local database /bin/auth_srv.sh: line 17: echo: write error: Broken pipe 1616479805: Socket error on client , disconnecting.

jysaloma commented 3 years ago

Re-install didn't help. I am having 18 local users in configfile logins: ... [18 local users]

anonymous: true customize: active: true folder: mosquitto certfile: fullchain.pem keyfile: privkey.pem require_certificate: false

access files set as

mosquitto $ cat acl.conf acl_file /share/mosquitto/accesscontrollist

and all users set in accesscontrollist mosquitto $ cat accesscontrollist user zigbee2mqtt topic readwrite # ...

jysaloma commented 3 years ago

Not sure if these are connected but I removed the Zigbee2mqttAssistant addon and rebooted my PI. Now situation seems to be normal and CPU load decent

kuva
cogneato commented 3 years ago

that's just the thing, Home Assistant OS users cant delete a mosquito.db, because it isn't even there ? that is to say, in the samba reachable folders

Uninstalling the addon and reinstalling would remove the mosquitto.db

nickrout commented 3 years ago

Uninstalling the addon and reinstalling would remove the mosquitto.db

It should NOT be necessary to remove mosquitto.db. This is a bug. It needs to be fixed. Why is there no dev input on this?

maddoxjny commented 3 years ago

uninstalling and reinstalling does not fix the problem. High cpu and devices not functioning.

nickrout commented 3 years ago

If you want to remove mosquitto.db (at your own risk) then there are two ways to do it that I can see.

Log into the base operating system and remove the file

rm /usr/share/hassio/addons/data/core_mosquitto/mosquitto.db

Or

Enter the addon container and remove the file

docker exec -it addon_core_mosquitto /bin/bash
rm /data/mosquitto.db
ghost commented 3 years ago

I think I have a path to overcome the problem. The comment from realthk got me thinking and I used the following procedure to get 5.1.1 working for me.

1. Take a snapshot (complete) from your current home assistant instance

In order to do this I re-installed it and re-started it - and then 5.1.1 worked fine. I don't know whats different to the situation before but it works now....

chris-ka1 commented 3 years ago

Yes, I had the exact same experience. Also scratching my head why it works now.

CubieMedia commented 3 years ago

Thanks for the tips, i also reinstalled the addon but it did not help with the issue on my installation. Also deleted the database manually, still no luck!

I can reproduce the problem very easily with restarting the addon while connected with MQTT-Explorer and trying to send messages as soon as it reconnects. It takes quite some time until the messages get accepted.

Can anyone with the problem verify this?

michalk-k commented 3 years ago

do you mean v5.1 doesn't work for you? or you are failing with v5.1.1?

CubieMedia commented 3 years ago

i tested with version 5.1.1, did i get this wrong?

were these tip all for version 5.1 ... i missed that in the thread.

testing addon with version 5.1 (just restored only this addon from an old backup) and it works again.

Thanks guys for the temporary solution!

va3jme commented 3 years ago

Sigh.... Rolled back to 5.1, and still can't get my Sonoff Zigbee bridge to work :(

raidnet-ms commented 3 years ago

Esphome things are offline while deconz things do work properly. Curious...

buzztiaan commented 3 years ago

For reference ; https://community.home-assistant.io/t/mosquitto-5-1-1add-on-is-broken/286979/11

It seems 5.1.1 changed a LOT of stuff instead of just the mosquitto broker

NetJaro commented 3 years ago

For those who want to rollback 5.1:

  • Backup and uninstall mosquitto 5.1.1
  • Fork mosquitto repository, edit "version": "5.1.1" to "version": "5.1." in config.json
  • Add this custom repository in the supervisor's add-on store and install I installed mosquitto 5.1 newly with this method, hope this help

Hi.

I try this but I cant add custom repo to supervisor

21-04-03 07:36:20 ERROR (MainThread) [supervisor.store.git] Can't clone https://github.com/NetJaro/addons/tree/master/mosquitto/ repository: Cmd('git') failed due to: exit code(128)

Mariusthvdb commented 3 years ago

can confirm that, upon cogneato's suggestion in Discord, deleting the 5.1 add-on, (copying the config) and re-installing (the now new 5.1.1) Add-on with the copied config, everything is running smoothly. NO errors in the log, and all topics are live.

So, don't update, but re-install which gets you the new version (and rewrites the mosquito.db) which seems to be the issue. which essentially is what @christoph-luebbe said here https://github.com/home-assistant/addons/issues/1887#issuecomment-802234693

bsmeding commented 3 years ago

I also can confirm that removing and re-adding the add-on works well. Please note the configuration first. Remove add-on, wait a minute so local storage is removed.

I also restarted home assistent. Then installed add-on, check if username and password is in configuration of mosquitto and the mqtt integration (see configuration of mqtt in the integrations page, of configuration.yaml if configured via .yaml)

jrhbcn commented 3 years ago

Hello,

@Mariusthvdb @bsmeding Unfortunately that is not my experience. I did remove my add-on and installed it from scratch. The MQTT disconnections I was suffering (with my shelly devices) did go away compared with upgrading normally from 5.1 to 5.1.1. However, with add-on version 5.1.1, MQTT seems to be much more slow, connecting to the broker with MQTTExplorer takes significantly more time with 5.1.1 than 5.1. Also core logs get flooded with "No ACK from MQTT server in 10 seconds" errors as in here. I have not seen any other log entries (not in core of from add-on) between 5.1 and 5.1.1 that might point to some explanation but if any more logs are needed to help track this problem I am happy to help.

I am using Home Assistant OS on a raspberry pi 4. This is my config info:

System Health

version core-2021.4.3
installation_type Home Assistant OS
dev false
hassio true
docker true
virtualenv false
python_version 3.8.7
os_name Linux
os_version 5.4.83-v8
arch aarch64
timezone Europe/Madrid

Home Assistant Supervisor

host_os Home Assistant OS 5.13
update_channel stable
supervisor_version supervisor-2021.03.9
docker_version 19.03.15
disk_total 457.7 GB
disk_used 23.7 GB
healthy true
supported true
board rpi4-64
supervisor_api ok
version_api ok
installed_addons Samba share (9.3.1), chrony (2.0.2), Home Assistant Google Drive Backup (0.103.1), Grafana (6.3.0), AdGuard Home (4.0.0), InfluxDB (4.0.4), ESPHome (1.16.2), TasmoAdmin (0.14.1), Zigbee2mqtt (1.18.1-1), motionEye (0.11.1), WireGuard (0.5.1), AppDaemon 4 (0.5.2), Visual Studio Code (3.3.0), Terminal & SSH (9.1.0), File editor (5.2.0), Check Home Assistant configuration (3.6.0), JupyterLab (0.5.0), Z-Wave JS (0.1.17), Mosquitto broker (5.1), Network UPS Tools (0.6.2)

bsmeding commented 3 years ago

Does the log from mqtt addon give more information? Also does all the devices in mqtt topics be slower or only specific ones?

I have almost all devices connected via mqtt (zigbee, zwave, milight) after you’re message i tried several in the homeassistant topic and cannot find any slower responding devices.

Dis you reconfigure mqtt server in the mqtt integration?

Op zo 11 apr. 2021 om 19:01 schreef jrhbcn @.***>

Hello,

@Mariusthvdb https://github.com/Mariusthvdb @bsmeding https://github.com/bsmeding Unfortunately that is not my experience. I did remove my add-on and installed it from scratch. The MQTT disconnections I was suffering (with my shelly https://shelly.cloud/ devices) did go away compared with upgrading normally from 5.1 to 5.1.1. However, with add-on version 5.1.1, MQTT seems to be much more slow, connecting to the broker with MQTTExplorer http://mqtt-explorer.com/ takes significantly more time with 5.1.1 than 5.1. Also core logs get flooded with "No ACK from MQTT server in 10 seconds" errors as in here https://github.com/bieniu/ha-shellies-discovery/issues/116. I have not seen any other log entries (not in core of from add-on) between 5.1 and 5.1.1 that might point to some explanation but if any more logs are needed to help track this problem I am happy to help.

I am using Home Assistant OS on a raspberry pi 4. This is my config info: System Health version core-2021.4.3 installation_type Home Assistant OS dev false hassio true docker true virtualenv false python_version 3.8.7 os_name Linux os_version 5.4.83-v8 arch aarch64 timezone Europe/Madrid Home Assistant Supervisor host_os Home Assistant OS 5.13 update_channel stable supervisor_version supervisor-2021.03.9 docker_version 19.03.15 disk_total 457.7 GB disk_used 23.7 GB healthy true supported true board rpi4-64 supervisor_api ok version_api ok installed_addons Samba share (9.3.1), chrony (2.0.2), Home Assistant Google Drive Backup (0.103.1), Grafana (6.3.0), AdGuard Home (4.0.0), InfluxDB (4.0.4), ESPHome (1.16.2), TasmoAdmin (0.14.1), Zigbee2mqtt (1.18.1-1), motionEye (0.11.1), WireGuard (0.5.1), AppDaemon 4 (0.5.2), Visual Studio Code (3.3.0), Terminal & SSH (9.1.0), File editor (5.2.0), Check Home Assistant configuration (3.6.0), JupyterLab (0.5.0), Z-Wave JS (0.1.17), Mosquitto broker (5.1), Network UPS Tools (0.6.2)

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/home-assistant/addons/issues/1887#issuecomment-817338790, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTPXC7ZQMVMILT3SG65H2LTIHIUZANCNFSM4YJHM6YA .

jrhbcn commented 3 years ago

Does the log from mqtt addon give more information? Also does all the devices in mqtt topics be slower or only specific ones? I have almost all devices connected via mqtt (zigbee, zwave, milight) after you’re message i tried several in the homeassistant topic and cannot find any slower responding devices.

Not that I can see, both logs seem normal indicating connections from clients normally. I use MQTT with both shelly, zigbee and others (system_sensors) with more than 50 devices. It seems to be more related to shelly devices than zigbee or others (but haven't checked 100% so it is more a feeling).

Dis you reconfigure mqtt server in the mqtt integration?

Yes, actually in one of my 5.1 <-> 5.1.1 installations (do not remember when) the default config stopped working so I had to create a new user in the "logins" section and re-configure the MQTT integration with that manual user.

Thanks for looking into this!!

Algirdyz commented 3 years ago

I'm having similar issues. But I cannot compare to an older installation because I just started using Home Assistant this weekend. I'm trying to connect arduino MQTT devices with the MQTT broker and initially they connect successfully with messages being sent in between.

However after a few minutes it disconnects and Arduino cannot connect to it anymore until I reset it. When trying to reconnect, Arduino is just waiting for a response, in the mean time MQTT broker logs say that the device connected successfully. After some time, depending on my timeout setting broker states the device exceeded timeout. At that point arduino also says that the server exceeded timeout and it just loops this way. So it seems to me that arduino expects some sort of response which doesn't come from the broker.

This is not an issue with the network because it works perfectly on first attempt after each reset and never succeeds when attempting to reconnect.

My arduino also did not disconnect when I tried setting up a public broker instead of using the adding.

I've tried everything I could find online and right now this seems to be the only open end left, which i cannot test since I can't downgrade the addon.

EDIT: I've tried adding my own forked repository to my home assistant but it just doesn't add...

Mariusthvdb commented 3 years ago

Hello,

@Mariusthvdb @bsmeding However, with add-on version 5.1.1, MQTT seems to be much more slow, connecting to the broker with MQTTExplorer takes significantly more time with 5.1.1 than 5.1. Also core logs get flooded with "No ACK from MQTT server in 10 seconds" errors as in here.

I dont see these errors in the logs anymore, but I can confirm the very long connection time to MQTT explorer. This was almost immediate before, and now takes 45 seconds. Have only 3 clients for the broker, 1 of which is my Owntracks which hasn't even been active yet. the other is a HA integration (on the same HA instance and the add-on broker) with some bluetooth trackers, the other is a dedicated Zwave hub.

Ill try to downgrade once more to see if this helps.

update

yep, snappy as before, took less than 5 seconds. so definitely something going on.

grantalewis commented 3 years ago

Thanks to those of you who have suggested methods for getting things working on 5.1.1. I've tried three times and so far haven't had any luck. I think I've tracked the problem down to the HA MQTT Integration not seeing the new broker but am stuck beyond that. Hoping someone might be able to give me a nudge.

Here are the steps I've taken so far:

Supervisor | Dashboard | Mosquitto broker | Configuration | (save info there to .txt file) Supervisor | Dashboard | Mosquitto broker | Info | Uninstall (reboot system) Supervisor | Add-on Store | Search for mosquitto | reinstall (reboot system)

I seem to see favorable info in Supervisor | Dashboard | Mosquitto broker | Log, but after waiting at least 30 minutes, I've got MANY entities that are unavailable. I can use MQTTExplorer to log on to the broker and see that my (cached?) entities are there, but they just show as "unavailable" in HA.

Under 5.1 if I go to Configuration | Integrations | MQTT | Configure | Re-configure MQTT, leave the defaults, and click Submit, I'm able to set various options, click Submit, and see the "Success!" message.

But after upgrading to 5.1.1 if I go to Configuration | Integrations | MQTT | Configure | Re-configure MQTT, leave the defaults (which should be, I'm assuming, the same settings that work with 5.1), I see "Failed to connect."

So I think I'm seeing that the problem lies not with Mosquitto broker version 5.1.1 but perhaps with the Integration located in Configuration.

Ideas? Suggestions? Thanks.

johnjoemorgan commented 3 years ago

@grantalewis here's a few things that come to mind

You may have done this but on the 2 occasions that I've had a total breakdown of the Zigbee2mqtt (devices present but no MQTT messages reaching the HAS) I've found that re-flashing the coordinator works to 'reboot' the system. Maybe this is the 'nudge' your problem needs to be solved?

https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_Home_1.2/bin/default

Thinking about pushing other stuff at the Mosquito Broker and HAS:

Can you run any other Service2mqtt such as Z-Wave 2 MQTT - I'm guessing not otherwise you'd have said/ So moving on ...

Perhaps installing a MQTT service such as

https://gitlab.com/iotlink/iotlink

on a Windows PC and testing if you can scrape the MQTT messages from that to the HAS successfully. That will help figure out id it's just Zigtbee2mqtt or all MQTT messages?

jrhbcn commented 3 years ago

But after upgrading to 5.1.1 if I go to Configuration | Integrations | MQTT | Configure | Re-configure MQTT, leave the defaults (which should be, I'm assuming, the same settings that work with 5.1), I see "Failed to connect."

@grantalewis, Do you have a user defined in "logins" section of the configuration of the addon? and using that user in the HA mqtt integration. Something similar happened to me when testing 5.1 <-> 5.1.1 installation and at some point I had to manually set it up (I guess it was automatically done when I installed HA for the first time).

bsmeding commented 3 years ago

@grantalewis try to change the username and password in both the addon and the integration page

I found in the code that there was a default homeassistant user and password in mqtt addon and think that is removed or broken. Mu setup prior to 5.1.1 had a homeassistant user in the integration but that username and password was not visible in the addon configuration

Reset on both sides maybe solve the connection issue

Edit: o see now similar answer as @jrhbcn

srnoth commented 3 years ago

After reverting to 5.1 to resolve issues, decided to give 5.1.1 another try based on some of the suggestions here, and so far so good. Here's what I did: -Backup backup backup (lol) -Stopped zigbee2mqtt plugin and set it not to start on boot -Uninstalled mosquitto plugin -Deleted HASS local user which I was using for MQTT (mosquitto) auth -Rebooted the entire HASSIO box -Installed mosquitto plugin (5.1.1), adjusted configuration to use a login specified there instead of a HASS user. -Started mosquitto plugin -Started monitoring "#" on "listen to topic" so I could observe all messages coming in -Restarted all Tasmota devices, waited for each of them to check in and finish re-adding retain messages -Started Zigbee2mqtt plugin, set to start on boot, observed messages start coming in, waited until all retain messages were re-added (hundreds in my case, took few minutes) -Once the flood of initial messages stopped, I tested my devices and observed none of the slowdown or latency that happened when I did the first in-place 5.1 --> 5.1.1 update. -Only seeing a socket error from one device which I will need to investigate separately, but no other errors so far.

jrhbcn commented 3 years ago

@srnoth have you tried connecting to the mqtt server using MQTT-Explorer? For me using v5.1 works OK but on version v5.1.1 just takes ages to connect and start showing messages (6-7 seconds vs 45 seconds). Furthermore, all my zigbee2mqtt also seem to work ok with v5.1.1 but not my shelly devices (through mqtt shelly discovery script) than report ACK timeouts in the logs.

srnoth commented 3 years ago

@srnoth have you tried connecting to the mqtt server using MQTT-Explorer? For me using v5.1 works OK but on version v5.1.1 just takes ages to connect and start showing messages (6-7 seconds vs 45 seconds). Furthermore, all my zigbee2mqtt also seem to work ok with v5.1.1 but not my shelly devices (through mqtt shelly discovery script) than report ACK timeouts in the logs.

Just tested and it connects and starts displaying messages within a couple seconds. Takes a but longer than that to load all.