rhasspy / wyoming-satellite

Remote voice satellite using Wyoming protocol
MIT License
617 stars 91 forks source link

non-robust handling of HA server disconnection #26

Open KeithSBB opened 9 months ago

KeithSBB commented 9 months ago

My HA server occasional performs automatic software updates in the early morning and then reboots. this breaks the connection to wyoming-satellite which then goes nuts producing many errors in its log files.:

run
8:30 AM
await writer.drain()
run
8:30 AM
File "/home/mycroft/wyoming-satellite/.venv/lib/python3.9/site-packages/wyoming/event.py", line 129, in async_write_event
run
8:30 AM
await async_write_event(event, self._writer)
run
8:30 AM
File "/home/mycroft/wyoming-satellite/wyoming_satellite/satellite.py", line 128, in event_to_server
run
8:30 AM
Traceback (most recent call last):
run
8:30 AM
ERROR:root:Unexpected error sending event to server
...

Sometimes it reconnects, other times it just hangs and I have to restart wyoming-satellite manually.

I'm running both wyoming-openwakeword and wyoming-satellite as services on a RP4. I also noticed that when wyoming-satellite hangs due to the server disconnecting I can't simply restart it. I must shutdown wyoming-openwakeword first (which forces a shutdown or wyoming-satellite) and then start wyoming-satellite (which will start up wyoming-openwakeword)

henne49 commented 9 months ago

not sure it will help you, but the wakeword service was not started for me, after issuing this command, it worked. As the wakeword service is now started before the satellite

sudo systemctl enable --now wyoming-openwakeword.service

KeithSBB commented 9 months ago

henne48: When my HA server updates itself and reboots the connection to my rp4 wyoming-satellite is broken and never reconnects. Both the wyoming-satellite and wyoming-openwakeword services are still running. I have to manually restart them to regain the connection to HA. So your suggestion doesn't apply in this case. A more robust solution would be code added to reconnect to HA if the connection is lost.

henne49 commented 9 months ago

Fully agreed, but I simply restart the raspi to fix in the meantime, but that did not work, as the openwakeword was not starting properly after a reboot.

KeithSBB commented 9 months ago

Hmm, rebooting my wyoming-satellite rpi 4 works for me...

On Jan 2, 2024 8:08 AM, henne49 @.***> wrote: Fully agreed, but I simply restart the raspi to fix in the meantime, but that did not work, as the openwakeword was not starting properly after a reboot.

KeithSBB commented 8 months ago

I just update my installation to 1.1.1 wyoming 1.5.2 and I'm still seeing wyoming-satellite failing to reconnect after the HA server reboots (automatically upon software updates). This is a real pain as I must manually shutdown wyoming-openwakeword (which stops wyomig-satellite) and then start wyoming-satellite (which starts wyoming-openwakeword)

Here's the wyoming-satellite log:


9:34 AM
ConnectionResetError: [Errno 104] Connection reset by peer
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
data = self._sock.recv(self.max_size)
run
9:34 AM
File "/usr/lib/python3.11/asyncio/selector_events.py", line 995, in _read_ready__data_received
run
9:34 AM
await self._waiter
run
9:34 AM
File "/usr/lib/python3.11/asyncio/streams.py", line 522, in _wait_for_data
run
9:34 AM
await self._wait_for_data('readuntil')
run
9:34 AM
File "/usr/lib/python3.11/asyncio/streams.py", line 637, in readuntil
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
line = await self.readuntil(sep)
run
9:34 AM
File "/usr/lib/python3.11/asyncio/streams.py", line 545, in readline
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
json_line = await reader.readline()
run
9:34 AM
File "/home/mycroft/wyoming-satellite/.venv/lib/python3.11/site-packages/wyoming/event.py", line 79, in async_read_event
run
9:34 AM
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
run
9:34 AM
event = await async_read_event(self.reader)
run
9:34 AM
File "/home/mycroft/wyoming-satellite/.venv/lib/python3.11/site-packages/wyoming/server.py", line 31, in run
run
9:34 AM
Traceback (most recent call last):
run
9:34 AM
future: <Task finished name='wyoming event handler' coro=<AsyncEventHandler.run() done, defined at /home/mycroft/wyoming-satellite/.venv/lib/python3.11/site-packages/wyoming/server.py:28> exception=ConnectionResetError(104, 'Connection reset by peer')>
run
9:34 AM
ERROR:asyncio:Task exception was never retrieved
run
9:34 AM
INFO:root:Disconnected from server
Rafaille commented 8 months ago

I just update my installation to 1.1.1 wyoming 1.5.2 and I'm still seeing wyoming-satellite failing to reconnect after the HA server reboots (automatically upon software updates). This is a real pain as I must manually shutdown wyoming-openwakeword (which stops wyomig-satellite) and then start wyoming-satellite (which starts wyoming-openwakeword)

As a workaround you can create an automation in HA to remotely restart services on your satellite right after HA updates.

KeithSBB commented 8 months ago

[Rafaille]

As a workaround you can create an automation in HA to remotely restart services on your satellite right after HA updates.

That sounds like a good idea, but I'm not sure how to remotely restart a service from within a HA 'start' automation. Can you provide a little more details to point me in the right direction? Thanks,

Update: added

shell_command:
   restart-satellite: sudo systemctl --host pi@192.168.7.16 restart wyoming-satellite

and i then called shell_command.restart-satellite from a HA startup automation. It doesn't work yet because of passwords and other security issues, but I think this is the right approach.

KeithSBB commented 8 months ago

No Lucking in getting the restart wyoming-satellite service automation upon HA start-up to work as suggested by [Rafaille].

I created key pairs for ssh. It works fine in a HA terminal, but gives "Host key verification failed." when run from the automation (or service call).
No success so far in figuring out why.
This is a still a serious issue for me.

Rafaille commented 8 months ago

No Lucking in getting the restart wyoming-satellite service automation upon HA start-up to work as suggested by [Rafaille].

I created key pairs for ssh. It works fine in a HA terminal, but gives "Host key verification failed." when run from the automation (or service call). No success so far in figuring out why. This is a still a serious issue for me.

You are indeed on the right track, I had the same issue. You just need to add a couple of arguments. This is the shell command I use:

ssh -o StrictHostKeyChecking=no -i /config/.ssh/id_xxxxxxx user@192.168.xxx.xxx -tt "sudo systemctl restart wyoming-satellite.service"

Just replace the x with your config and make sure that the user you login as has sudo privileges

KeithSBB commented 8 months ago

Adding -o StrictHostKeyChecking=no is usually considered to be a bad approach, but I tried it and along with explicitly specifying the key "-i /config/.ssh/id_rsa" it works. (I had to move id_rsa from under /root to /config)

Since I created rsa key pairs I tired it again without '-o StrictHostKeyChecking=no' and that works too!

I guess the problem was that automation runs under a different user than root while the terminal is root?

anyway, thanks for you help.

Rafaille commented 8 months ago

Yes sorry I forgot to mention that I had to move the key file. Good point about StrictHostKeyChecking not being necessary, I will take it out my command as well, just for good measure. Happy that it works for you, although a permanent fix will be welcome eventually ;)

Mincka commented 4 months ago

In my case, I tried the remote restart of the service, but sadly, it does not help. Same error than @KeithSBB The error is still there when I wake the satellite remotely (thanks to https://github.com/rhasspy/wyoming/pull/10 and https://github.com/rhasspy/wyoming-satellite/pull/144). However, what seems to work is a remote reboot of the device... It will do the trick while we hope for a fix.