jishi / node-sonos-http-api

An HTTP API bridge for Sonos easing automation. Hostable on any node.js capable device, like a raspberry pi or similar.
http://jishi.github.io/node-sonos-http-api/
MIT License
1.84k stars 462 forks source link

Crash when coordinator missing #740

Open rtimmons opened 4 years ago

rtimmons commented 4 years ago

Hi there. I'm using node-sonos-http-api for my smarthome project, and I notice that when the sonos system is reconfiguring itself from major topology changes, the SOAP API may return a 500 in response to some requests:

Error: Got status 500 when invoking /MediaRenderer/AVTransport/Control

(For context I have a 6-zone setup with about 15 speakers; doing lots of grouping/ungrouping with concurrent /status calls reliably reproduces this.)

I believe this happens when sonos is electing new coordinator(s) after dramatic zone/topology changes. The effect internally is that the coordinator of one or more player nodes is null or undefined. This cascades to node-sonos-http-api crashing. Note this isn't a promise rejection. This is a null-pointer error.

This doesn't look like a dupe of #706. I am on the latest http-api and discovery packages and latest sonos software. It may be related to #361 but I can't tell for sure.

I wanted to get your input on the best way to fix and/or know if you have managed to work around this in other way? Perhaps this check should also return a 500 early if player.coordinator is undefined since many of the actions assume it exists? Returning a 500 here seems best in my opinion - the sonos system isn't behaving well in this scenario.

I added a band-aid duct-tape solution that wraps all api.requestHandler calls in try/catch and http-api hasn't crashed on me since, but this is not "good" code.

I'm fine to live with this band-aid in my fork, but I thought you may want to incorporate a better fix upstream. If you're okay with the band-aid or have ideas for a better/more-robust solution please let me know and I would be happy to investigate and/or ship as a PR.

Thank you for your work on the node+sonos ecosystem. They have made my sonos smart-home integration a joy.

jishi commented 4 years ago

Is it the /status call that is crashing the API? I'm surprised that the .coordinator would be null, it is only changed when we get a topology change from the sonos players, this means that the topology state updates gives information that the API can't comprehend, thus setting it to null. Maybe it would be better to never set it to null and keep the old coordinator (and expect a later update to come and fix it).

I'll see if I can reproduce this locally.

rtimmons commented 4 years ago

Thanks for the response.

Yes it is the /status call that is causing it to crash (and/or perhaps /zones since I call both - I can check deeper later if it would help).

Maybe it would be better to never set it to null and keep the old coordinator (and expect a later update to come and fix it).

This seems logical - downstream is never expecting it to be null. However another argument would be that if we somehow set the coordinator to null that we should reject all status/zone calls as 500s until it's set to non-null (since the current state is kinda undefined). (That's the essence of checking if player.coordinator is null but there may be a better place to do this.) Either seems reasonable imho.

I'd be happy to test a patch locally if you're having trouble repro'ing locally - I suspect this may happen more frequently on very large sonos topologies.

Thanks again for your time.