jensrossbach / node-red-contrib-sony-audio

Node-RED nodes for accessing Sony Audio Control API
MIT License
2 stars 0 forks source link

Can't reconnect automatically after Sony device has been off #37

Closed gitothy closed 2 years ago

gitothy commented 2 years ago

Describe the bug If the Sony device [STR-DN1080] is powered off then the palette looses connection and reports so in the log - this is expected. When the device is powered on the palette doesn't seem able to initiate a reconnect without either re-deploying the flow or restarting NodeRED.

Initiating an action through a 'Controller' node doesn't seem to re-initiate the connection. I've got a Receiver node set to feed in to a controller for any changes, but making a change on the device from the remote, e.g. changing the volume, also doesn't reinitiate things.

I'm seeing the below generic disconnection message when trying to initiate an action through a Controller node after the device is repowered:

[error] [sony-audio-controller:Living Room to 0 (no sub)] FetchError: request to http://192.168.1.250:10000/sony/audio failed, reason: connect EHOSTUNREACH 192.168.1.250:10000

To reproduce Steps to reproduce the behavior:

  1. Turn off Sony device
  2. Wait until Node-RED/the palette loses connection
  3. Turn on device -> not possible to read/write any commands as status is 'disconnected'.

Context information Please complete the following information.

Additional information If a Controller node is called and there is no connection to the device it would be great if the connection could be re-attempted.

jensrossbach commented 2 years ago

Hi @gitothy ,

the controller node does not maintain an active connection to the Sony audio device. In the moment when the node receives an input, it sends an HTTP-POST request to the device (i.e., in that moment it opens the connection to the device). Only the receiver node has an active connection open to the websocket of the device all the time as it must be able to receive notifications at any time. The error message EHOSTUNREACH indicates that the device is not reachable in the moment when the request is made. Currently I have no explanation why this happens and why it recovers after redeploying the flow. I use the Node.js library node-fetch for executing the HTTP-POST requests, so I assume a problem somewhere in this library. But I will continue investigating if there could be an error in the controller node as well.

In the reproduction steps you write at item 3:

Turn on device -> not possible to read/write any commands as status is 'disconnected'.

What do you mean with "status is 'disconnect'"? The controller node does not have a status, only the receiver node has a status like 'ready', 'not connected', 'connection error' etc.

jensrossbach commented 2 years ago

One more question: How do you switch off your device completely? I don't have STR-DN1080, just a HT-XT3 and my device for instance always switches into stand-by mode when sending an "off" command (also when pressing the power button on the device itself). I know that there is also a "standby" command, but it didn't work in the past for my device, therefore I removed it from the controller node. It might be that the STR-DN1080 behaves differently than the HT-XT3 for power commands and I might have to change the node in this regard.

gitothy commented 2 years ago

Hi Jens,

Apologies, I can see I need to clarify a few points.

Problem Re-description: I have a Receiver node which automatically pulls values for volume level and mute status. I also have Controller node which can sends commands to turn the volume up, down or off. Problem Effect: Device is powered on -> Controller nodes can adjust the volume, but Receiver node values for volume level don't update.

tldr: how can I reconnect the Receiver nodes for a newly powered on device without restarting Node-RED?

jensrossbach commented 2 years ago

ok, thanks for clarification, this makes sense now. Although I've implemented a recovery mechanism in the receiver node that should re-establish the connection on failure, it is possible that this mechanism doesn't work correctly in all cases. I have to investgate this further.

gitothy commented 2 years ago

If automatic and integrated that's excellent.

Otherwise a straightforward solution I thought could be to add an extra node type (alongside Controller and Receiver) which when 'actioned' attempts to re-connect the websocket connection?

jensrossbach commented 2 years ago

Hi @gitothy,

I simulated your use case by deactivating the network interface of my computer where I have running Node-RED on. The behavior what I observed (and what I expected) is that the receiver node lost connection to the Sony device and then the recovery mechanism started. When I activated the network interface again, the node successfully reconnected to the device.

24 Sep 18:31:12 - [debug] [sony-audio-device:HT-XT3] Connected to service 'system'
24 Sep 18:31:12 - [debug] [sony-audio-device:HT-XT3] Creating method list for service 'system' with event mask b:1111
24 Sep 18:31:25 - [error] [sony-audio-device:HT-XT3] Connection error: Error: read ECONNRESET
24 Sep 18:31:25 - [debug] [sony-audio-device:HT-XT3] Connection closed: 1006 (Socket Error: read ECONNRESET)
24 Sep 18:31:30 - [debug] [sony-audio-device:HT-XT3] Trying to recover
24 Sep 18:31:30 - [error] [sony-audio-device:HT-XT3] Connection failed: Error: connect ENETUNREACH xxx.xxx.xxx.xxx:10000
24 Sep 18:31:35 - [debug] [sony-audio-device:HT-XT3] Trying to recover
24 Sep 18:31:35 - [error] [sony-audio-device:HT-XT3] Connection failed: Error: connect ENETUNREACH xxx.xxx.xxx.xxx:10000
24 Sep 18:31:40 - [debug] [sony-audio-device:HT-XT3] Trying to recover
24 Sep 18:31:40 - [debug] [sony-audio-device:HT-XT3] Connected to service 'system'

As you can see in above log, the recovery mechanism makes retries until it can connect again. Now I have to say that the number of retries is limited to 5. Between each retry is a pause of 5s, which means that the device has to be up again within 25s, otherwise there will not be an automatic recovery anymore. When I implemented this recovery mechanism, I mainly had short network outages in my mind. This is the reason for the retry limit.

To solve your use case, I see two possibilities:

  1. Remove the retry limit and perform retries infinitely.
  2. Keep the limit and explicitly trigger a retry from a controller node as soon as a request is sent to the device.

I have to think about this a bit in order to find the best solution.

jensrossbach commented 2 years ago

Of course a third solution could also be an extra node as you propose. Or simply a special command which needs to be sent to the controller node. So there are several possible solutions... need to think about it.

gitothy commented 2 years ago

Great stufff @jensrossbach , thanks for taking the time to investigate this.

jensrossbach commented 2 years ago

I also discovered a bug in the recovery mechanism now. The retry counter is not reset after successful reconnection. This means if after for example two retries the connection was successfully recovered, at the next connection loss, the recovery mechanism would only do a maximum of three more retries instead of five.

Besides fixing this, I decided now to implement the following additional improvements:

  1. There will be a configurable long time recovery with infinite retries for the websocket connection.
  2. The controller node will get a special command to explicitly trigger a reconnection.
gitothy commented 2 years ago

Thanks again. Let me know if you want any assistance testing the changes.

jensrossbach commented 2 years ago

I just released version 1.8.0 including above mentioned improvements. @gitothy please test the new node release and close this issue in case it is solved from your perspective.

jensrossbach commented 2 years ago

One additional hint: the new extended recovery mechanism is turned off by default. You have to switch it on in the settings of your configuration node. If you don't change the default parameters, it performs reconnection retries every minute for an infinite time.

gitothy commented 2 years ago

Hi, apologies for the delay coming back was away with work this week. I've now had a good look at the changes in1.8.0 but I've not had much luck triggering a reconnection:

  1. Have set up the Configuration node aiming for the flow to keep retrying every 30 secs in perpetuity. Is leaving 'Retries' blank the correct way to achieve this? Can't seem to get it to reconnect to the ws feed:

image

  1. Have also set up a controller node using the new 'Reconnect' command. For testing, I'm using a standard Inject node to trigger it. When triggering, the websocket doesn't seem to be reconnected and it doesn't even seem to fire; I'm not seeing the little green square and 'success' underneath it as with the other controller nodes. See the config and comparison to another controller node using command 'Set Speaker Settings': image

Triggered from the same inject node, one has the 'success' and has run but nothing on the new one: image

Thanks again,

jensrossbach commented 2 years ago

Hi @gitothy,

thank you for testing my changes and sorry that it doesn't work for you. On my side it is working fine.

Some words to your test report:

First of all, it is not correct to leave the retries field blank, this might be the reason why the extended recovery doesn't work for you. For infinite retries, you should pass a value of 0 (which is also the default). Anyway I'm wondering how you were able to make this field blank because the spinner input field validates the value and at least on my side a blank content is automatically changed to 0.

Furthermore, it is normal that there is no "success" status indication at the controller node when you trigger the reconnect command because there is actually nothing sent to the device, only the reconnection is triggered at the receiver nodes. What you can see in good case when triggering the controller node is that the status indicators of the receiver nodes change from either red or grey to green. I might think of adding a success indication to controller node for this command but currently there is no feedback channel from the module doing the reconnection back to the controller node existing.

Can you please retest with setting the retry field to 0 instead of blank? If you still have problems, it would be helpful if you could provide me logs on debug level. For this you have to change the log level of your Node-RED instance to "debug" and extract the traces after testing. For further instructions regarding this, you can look here: Node-RED - Logging

gitothy commented 2 years ago

Have changed the field to 0 and restarted NodeRed. Performed a few tests, still no luck so have gone with the debug:

nr-sony-no-websocket-reconnect.log

Timings of activities to match against the log (n.b. that the log time is 1hr out)

10:37 - debug enabled in NRed + restarted. sony device already powered on. tested by adjusted the volume up and down and it reads in correctly on the ws to my receiver node. 10:38 - sony device off button pushed - appears to turn off but i think there is still connectivity in the background 10:39 - hear internal relay in device 'click' -> powered down completely 10:49 - device turned on, leave everything for 10 mins. 10:59 - perform actions on device (change volume) -> not being read back in to the same node as was working earlier 11:00 - restart flows in node red 11:01 - change volume on device -> websocket feed is now picking up these changes 11:01 - logging set back to "info" and nodered container is restarted

jensrossbach commented 2 years ago

ok, strange, I don't see any reaction from the receiver nodes (or device node better said as this is the one being connected to the Sony device) when you turn power off. As I told you earlier, I simulated your use case by turning off the network adapter of my PC, maybe this is different to turning off the device itself. On the other hand, for the HT-XT3 I don't have the possibility to turn it off completely besides pulling out the power cable from the plug. But maybe I have to try this in order to reproduce your case exactly on my side. But at least I have an explaination why it doesn't work on your side. As the connection loss is not recognized, there is also no recovery mechanism triggered.

One question: did you try to reconnect using the controller node after you turned the device back on? Because I see really nothing related to the Sony nodes in the log between 10:38:11 and 11:00:53.

gitothy commented 2 years ago

Yes I did try to use the Reconnect controller node and nothing seemed to happen.

jensrossbach commented 2 years ago

As the node does not recognize the disconnection, it "thinks" that it is still connected and in this case the reconnection command just does nothing. So all the behavior is explainable, but I'm not sure why the connection loss is not noticed.

jensrossbach commented 2 years ago

ok, next try. I released version 1.9.0 which - besides two new features - further improves connection handling:

  1. Instead of ignoring a request to reconnect to the Sony audio device in case a connection is still open, the controller node now closes the connecton first and reconnection is attempted in any case.
  2. In order to better detect connection losses like in your case, while being connected to the device, a ping message is now sent to the websocket every 30s. If the ping message is not acknowledged within five seconds, the connection will be closed and the recovery mechanism starts.

Additionally the controller node now shows a green status after triggering reconnection. This is not an actual indication that the reconnection succeeded, rather it only indicates a reaction of the controller node on the trigger.

Please test this new version and let me know if it improves your use case. Thank you!

gitothy commented 2 years ago

Thanks for the fixes. Have updated to 1.9.0 and restarted Node-RED. Seems now to not be working at all? Have tried to restart again but no dice.

image

image

Am away the rest of the week but will certainly test any fixes when I return.

jensrossbach commented 2 years ago

yes, sorry... I forgot to add a dependency to the package.json file for a new package I'm depending on with the nodes. I didn't notice this problem in my test environment because there the package was already installed manually. I released the bugfix version 1.9.1 which (hopefully) corrects this.

gitothy commented 2 years ago

Hi Jens, all looking good with 1.9.1. Automatic websocket reconnection is working and nothing seems to be broken.

Closing ticket, thanks very much again.