home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.76k stars 30.87k forks source link

Matrix Hub: Hotword no longer working minutes after hass restart #22219

Closed schneekluth closed 4 years ago

schneekluth commented 5 years ago

Home Assistant release with the issue: 0.91.1

Last working Home Assistant release (if known): unknown, just started using matrix component

Operating environment (Hass.io/Docker/Windows/etc.): virtualenv

Component/platform: matrix hub

Description of problem: I'm using matrix as chatbot and as notifier. Notifier works reliable but the chatbot (hub component) only works a few minutes and later on does not respond any longer to any predefined keywords.

Problem-relevant configuration.yaml entries:

matrix:
  homeserver: !secret matrix_server
  username: !secret matrix_hassbot_user
  password: !secret matrix_hassbot_password
  rooms:
    - "!secret matrix_room"
  commands:
    - word: Mond
      name: Mondphase
      rooms:
        - "!secret matrix_room"

This is an example hotword I've configured. It works as expected, when hass is restarted.

Additional information: I have set logger to homeassistant.components.matrix: debug but there is no sign of an error. Component quietly stops working.

tophers commented 5 years ago

I have a similar setup, without issue, hotwords work without issue, slight change in config, I don't have the trailing rooms:

matrix:
  homeserver: !secret hassmatrixserver
  username: !secret hassmatrixuser
  password: !secret hassmatrixpass
  rooms:
    - !secret hassmatrixroom
  commands:
    - word: outside
      name: outside
    - word: temp
      name: temp

I also just upgraded to 0.90.1, still no issues with matrix and hotwords with the config.

schneekluth commented 5 years ago

Interesting. In my actual config I've set up 2 rooms where the bot should listen for hotwords. In the lower section I define the room where the hassbot should answer. That way it was described in the config. I'll try your solution later. I am just wondering why it quietly stops working because everything works fine after a reboot.

tophers commented 5 years ago

(edit: I re-read and see you said you did define the two rooms, so below maybe moot)

I'm actually going to try a multi-room setup as well on my instance and see if I can recreate the issue. I am out this weekend, but will test Monday.

In your example config I only see the one server. The component docs also say that any additional rooms need to be under default as well. In your actual config is that correct? The initial example showed only the one:

Component example:.

matrix:
  homeserver: https://matrix.org
  username: "@my_matrix_user:matrix.org"
  password: supersecurepassword
  rooms:
    - "#hasstest:matrix.org"
    - "#someothertest:matrix.org"
  commands:
    - word: testword
      name: testword
      rooms:
        - "#someothertest:matrix.org"
    - word: testotherword
      name: testotherword
      rooms:
        - "#hasstest:matrix.org"
schneekluth commented 5 years ago

I've now switched to the following config:

matrix:
  homeserver: !secret matrix_server
  username: !secret matrix_hassbot_user
  password: !secret matrix_hassbot_password
  rooms:
    - !secret matrix_hassbot_room
  commands:
    - word: Help
      name: Help
    - word: Mond
      name: Mondphase
    - word: Blog
      name: Blogstatistik
    - word: Wecker_Aus
      name: Wecker_Aus
    - word: Licht
      name: Licht
    - word: Adieu
      name: Adieu
    - word: Pflanzen
      name: Pflanzen

It also works as expected after a restart but after some hours the hotwords are no longer aswered by the bot. I've also changed the debug level to critical but still no erros appear. I've no idea how to debug this :/

tophers commented 5 years ago

So far I can't reproduce the issue :( I did add an additional room to my config

matrix:
  homeserver: !secret hassmatrixserver
  username: !secret hassmatrixuser
  password: !secret hassmatrixpass
  rooms:
    - !secret hassmatrixroom
    - !secret alt_hassmatrixroom
  commands:
    - word: outside
      name: outside
      rooms:
        - !secret hassmatrixroom
    - word: temp
      name: temp
      rooms:
        - !secret alt_hassmatrixroom

My last home assistant restart was over 8 hours ago and so far it is still responding. I will update if that changes.

tophers commented 5 years ago

I just wanted to update this, I'm restarting today to move to 0.90.2. Unfortunately I'm just not able to replicate this issue. HA/Matrix continued responding with the config above for 4 days. I got a response in both rooms, just prior to restarting for the update. Sorry I couldn't be more help.

github-actions[bot] commented 5 years ago

Hey there @tinloaf, mind taking a look at this issue as its been labeled with a integration (matrix) you are listed as a codeowner for? Thanks!

schneekluth commented 5 years ago

So right now I'm on 0.91.1 and this issue still persits. Really have no idea what is the reason because ist works flawlessly when i restart HASS but still stopf working after some time. @tinloaf What would be the correct way to debug this? As mentioned above I've used homeassistant.components.matrix: debug in logger.yaml. Any suggestion?

lordievader commented 5 years ago

Hi, I have the same issue. I've had it before 0.93.0 never really looked into it though. In the past couple of days I've been trying to debug the issue. It seems to me that the bot simply doesn't get any messages from the channel.

I've set the logger level of the matrix component to debug and added a statement to the _handle_room_message at the top of the function to log when a message was received. Yet, this line is never triggered.

Sometimes homeassistant responds right after a restart of the service, sometimes it doesn't. That said, it never responds after sometime. My feeling is that the chat bot gets stuck in some kind of loop causing it to ignore and further messages (no idea if this is true).

lordievader commented 5 years ago

I think I found the bug. It is in _join_or_get_room. When you specify a room by its alias (in my case #hassio:<domain> it might fail to find this room.

[homeassistant.components.matrix] Looking for room: #hassio:<domain>
[homeassistant.components.matrix] #hassio:<domain> in (#openintel_monitoring:<domain>) []: False
[homeassistant.components.matrix] #hassio:<domain> in (#openstack:<domain>) ['#openstack:<domain>']: False
[homeassistant.components.matrix] #hassio:<domain> in (#home:<domain>) []: False
[homeassistant.components.matrix] #hassio:<domain> in (#dacs-dns:<domain>) ['#dacs-dns:<domain>']: False
[homeassistant.components.matrix] #hassio:<domain> in (None) []: False
[homeassistant.components.matrix] #hassio:<domain> in (#sot-monitoring:<domain>) ['#sot-monitoring:<domain>']: False
[homeassistant.components.matrix] #hassio:<domain> in (#testing:<domain>) ['#testing:<domain>']: False
[homeassistant.components.matrix] Joined room !WmlsXzLzpkifzTUeaE:<domain> (known as #hassio:<domain>)

Modified that output a little. The important part is the third line where it tries to match against the #home channel, which is an alias of #hassio. The line responsible for logging this is:

_LOGGER.debug('%s in (%s) %s: %s', room_id_or_alias, room.canonical_alias, room.aliases, test)

The test variable holds the outcome of room_id_or_alias in room.aliases which is the test the code uses to determine if the room already has been joined. As you can see from the log, the list of aliases is empty for some reason so this evaluates to False and the room is joined again. This happens for every message sent, since it goes through the notification channel, which calls this function again. I think that joining the room again breaks the listener for the hotword.

A work around is to specify the room_id in your homeassistant config.

Also, I don't know why my room aliases are not showing up as room aliases.

stale[bot] commented 5 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

fti7 commented 5 years ago

Same issue here. It stops working on the next Day. I already thought its a reconnection issue, but i cannot reproduce this. :-(

lordievader commented 5 years ago

@fti7 Are you using the channel name or the id?

I think the bug lies with the matrix python sdk, in some cases it doesn't return the reference to the room when searching for the room name.

fti7 commented 5 years ago

@lordievader i've used an alias. But since yesterday i switched to the Real Room ID. Unfortunately it stopped working again. Need to check this evening if i got other Errors or if its the same issue

fti7 commented 5 years ago

So indeed, same issue. I've used Room IDs like ""!WGYAQdAtJXXErYtuSy:sampleserver.com

I've checked the used matrix-client Version and its really old. Changed it to the newest Version and i give it a try now:

Docker Container: /usr/src/homeassistant/requirements_all.txt and /usr/src/homeassistant/components/matrix/manifest.json Replaced "matrix-client==0.2.0" with "matrix-client==0.3.2"

If its working i will create an Pull-Request

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

fti7 commented 4 years ago

Still buggy... its not fixed with the newest module version.

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

fti7 commented 4 years ago

Still buggy

jmtemmos commented 4 years ago

I have the same problem with hassio 107.7. Works only once, when hassio is restarted. Then no message is got from the channel any longer.

jmtemmos commented 4 years ago

Still buggy in 108.4 hassio version.

benjmarshall commented 4 years ago

I also have this bug. Currently on Home Assistant 0.109.5 running on HassOS. I have tried moving to room IDs instead of alias and the behaviour doesn't change for me. Enabling debugging gives some hint as to where the problem might be and I think I am in agreement with @lordievader that this seems to be an issue with the bot re-joining rooms.

For reference I have a basic setup configured with a single room and a single command:

matrix:
  homeserver: !secret matrixdomain
  username: !secret matrixuser
  password: !secret matrixpass
  rooms:
    - !secret matrixroom
  commands:
    - word: temp
      name: temp
      rooms:
      - !secret matrixroom

notify:
  - name: matrix_notify
    platform: matrix
    default_room: !secret matrixroom

My particular use case shows this well as I have a matrix command and automation setup to respond to "!temp" with the current temperatures in the house. After restarting homeassistant I can succefully get a response to my command once, all subsequent commands fail. Below is the debug output which shows that the matrix component succesfully logs in and detects it is allready in a room straight after a restart. It detects and handles the first command but when issueing the reponse notification it rejoins the room a second time. This seems to break the bot's listener and subsequent commands fail.

2020-05-07 12:34:33 DEBUG (SyncWorker_17) [homeassistant.components.matrix] Logged in using stored token.
2020-05-07 12:34:42 DEBUG (SyncWorker_13) [homeassistant.components.matrix] Already in room <<removed>>
2020-05-07 12:35:24 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: ! temp
2020-05-07 12:35:39 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: !temp
2020-05-07 12:35:39 INFO (SyncWorker_8) [homeassistant.components.matrix] Joined room <<removed>> (known as <<removed>>)
2020-05-07 12:35:39 DEBUG (SyncWorker_8) [homeassistant.components.matrix] {'event_id': '$FPwL5YXVgN7d1AOimbkfc1DK0XWcLjLjgP0bqzHiLSQ'}

Is it possible that the others seeing this bug are affected in the same way, i.e. not the service stopping after some period of time, but after the first time that a notification is issued to the same room?

EDIT:

To add some further weight to this I have tested switching to a two room setup, where one room is used to monitor commands, and a second is used for notifications. Using this setup I can succesfully issue multiple commands in room 2 and they are all handled correctly, sending notifications to the first room. See log below (note first command was issued in the "notifications room" so was correctly ignored):

2020-05-07 12:51:43 DEBUG (SyncWorker_14) [homeassistant.components.matrix] Logged in using stored token.
2020-05-07 12:51:53 DEBUG (SyncWorker_15) [homeassistant.components.matrix] Already in room <<removed>>
2020-05-07 12:51:53 INFO (SyncWorker_15) [homeassistant.components.matrix] Joined room <<removed>> (known as<<removed>>)
2020-05-07 12:52:25 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: !temp
2020-05-07 12:52:37 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: !temp
2020-05-07 12:52:37 INFO (SyncWorker_10) [homeassistant.components.matrix] Joined room <<removed>> (known as <<removed>>)
2020-05-07 12:52:37 DEBUG (SyncWorker_10) [homeassistant.components.matrix] {'event_id': '$jtAIziVqrWdXyVAFZpR9DskEZYDw8K2DV0UrCY6sUD8'}
2020-05-07 12:52:53 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: !temp
2020-05-07 12:52:53 INFO (SyncWorker_11) [homeassistant.components.matrix] Joined room <<removed>> (known as <<removed>>)
2020-05-07 12:52:53 DEBUG (SyncWorker_11) [homeassistant.components.matrix] {'event_id': '$t86TLYDjbFuyttE6R94BtukQExwrrr6Ie6p4mhB_EhE'}
2020-05-07 12:53:37 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: !temp
2020-05-07 12:53:37 INFO (SyncWorker_19) [homeassistant.components.matrix] Joined room <<removed>> (known as <<removed>>)
2020-05-07 12:53:37 DEBUG (SyncWorker_19) [homeassistant.components.matrix] {'event_id': '$miPpu0Z1hYRAZvEvvhgoMKvh1-1gtF_l9oNPVawTWIs'}
2020-05-07 12:55:41 DEBUG (Thread-2) [homeassistant.components.matrix] Handling message: !temp
2020-05-07 12:55:41 INFO (SyncWorker_1) [homeassistant.components.matrix] Joined room <<removed>> (known as <<removed>>)
2020-05-07 12:55:42 DEBUG (SyncWorker_1) [homeassistant.components.matrix] {'event_id': '$zcWPRb75ixAGIZImWsWZCG9uLrSwMAWTzdtgX_Wu48k'}
benjmarshall commented 4 years ago

I have created a patched version of the Matrix component which fixes my issue. I've done this by removing the call to the matrix client to fetch a list of joined rooms and replaced it with a locally maintained list of rooms. This allows the component to correctly avoid re-joining a room in my testing today. There is an obvious weakness in this patch because it will not gracefully handle the case where the bot user gets kicked or removed from a room. In a private setup for control/status of homeassistant like mine this won't be an issue, but I don't think this is a good enough long term fix for all. Having said that if anyone else would like to try my patch for their own setup it can be found here. You will need to add these files to a folder within your "custom_components" directory. The matrix section of your configuration.yaml can stay the same.

stale[bot] commented 4 years ago

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

perryflynn commented 3 years ago

This is still a issue. When using a automation which answers to a command, every following command will be ignored.

Home Assistant 2020.12.0

- alias: Toggle Bell Mute Switch
  trigger:
    platform: event
    event_type: matrix_command
    event_data:
      command: toggle_bell
  action:
    - service: homeassistant.toggle
      data: { entity_id: 'switch.bell_muted' }
    - delay: '00:00:01'
    - service: notify.matrix_notify
      data_template:
        message: >
          {% set isbell = is_state('states.switch.bell_muted', 'on') %}
          {{ 'Bell is now muted' if isbell else 'Bell is now unmuted' }}
RosaEinhorn commented 3 years ago

I am having exactly the same issue. :(

kuhlmannmarkus commented 3 years ago

I have a slightly different behaviour. The component reacts to the first hotword I send into the Matrix channel and then stops working (no response to subsequent hotwords). Any solution yet? Please reopen this issue.

tophers commented 3 years ago

I have a slightly different behaviour. The component reacts to the first hotword I send into the Matrix channel and then stops working (no response to subsequent hotwords). Any solution yet?

I've seen this same behavior for 10 or more versions now, and I had less issues than most early on. This very likely won't or can't be fixed with the current integration and the current (mostly unmaintained) client library being used. Everytime it has been opened it doesn't get any activity and is closed again.

The integration probably needs to be re-written with matrix-nio as a client library. There was just a version bump in the matrix-client library being used #53508 , and while the maintainer did bump the client version to keep the integration working, they also state it is unmaintained (https://github.com/matrix-org/matrix-python-sdk/pull/321#pullrequestreview-711682483) and recommended rewrite w/ nio. The PR did to resolve the integration not loading at all when using newer instance versions of Synapse.