openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.57k forks source link

JuPnP failing - causing devices to go offline #5892

Closed morph166955 closed 3 years ago

morph166955 commented 5 years ago

Refer to https://community.openhab.org/t/too-much-time-before-a-sonos-thing-becomes-definitively-online/62214

For no observed reason, JuPnP seems to fail randomly and cause things to go offline. Sonos speakers seem to be the biggest culprit. Once this condition is reached, all things that rely on JuPnP seem to fall offline quickly. I've seen this happen between 3 days and 2 weeks, no specific time length causes the failure.

This can be mitigated currently by having a rule monitor getThingStatusInfo(mything).getStatus().toString() and when it goes offline it executes "executeCommandLine(”/usr/bin/ssh -p 8101 -i /home/openhab/karaf_keys/openhab.id_rsa openhab@localhost bundle:restart org.jupnp", 120000)"

wborn commented 4 years ago

Do you think this issue should be solved in openhab-core or the jupnp library @lolodomo?

See also: https://github.com/eclipse/smarthome/issues/6779

lolodomo commented 4 years ago

I am not 100% sure, my analysis was done several months ago. But I guess it is more a bug in the JUPnP library. If it was a bug in the openHAB core framework, I would have probably already fixed it.

wborn commented 4 years ago

Thanks! It might help to also create an issue for this in the jupnp issue tracker.

jaywiseman1971 commented 4 years ago

https://community.openhab.org/t/oh-2-4-0-m7-sonos-online-and-immediately-offline-again/59253/11

jaywiseman1971 commented 4 years ago

Sonos & jUPNP Bounty to Resolve Communication Errors on OpenHAB

Environment: OpenHAB 2.4 Sonos Binding (tried 2.4Mx to 2.5.0.201903252254) jUPNP Binding - 2.5.2

Sonos Devices: 10 Sonos One's (4 in pairs) 4 Sonos Amp's Runining latest firmware on speakers and the latest Sonos Application version

This situation below has been happening to me and others for over a year now.

OpenHAB Forum Postings:

https://community.openhab.org/t/sonos-communication-error-after-speaker-firmware-update/84135 https://community.openhab.org/t/oh-2-4-0-m7-sonos-online-and-immediately-offline-again/59253 https://community.openhab.org/t/too-much-time-before-a-sonos-thing-becomes-definitively-online/62214 https://community.openhab.org/t/sonos-things-offline-after-some-time/88870

Scenarios:

1) Fresh start of OH everything is fine between startup till around 24 hours with starting/stopping/pausing Sonos either via the Sonos App or rules for Sonos. After X hours; units will go into Communication Error states and sometimes recovery. What I have noticed when it does recovery; its exactly 10 minutes from when it went into Communication Error state. If a unit doesn't recovery then the rest of the units will soon after go into Communication Error state.

2) Restarting the Sonos Binding does NOT recover the Sonos THINGS. They sit in a Error Handler State after a Sonos Binding restart.

3) Restarting the jUPNP Binding does recover the Sonos THINGS but then puts other THINGS in a bad state which they do NOT recovery from. There is a posting about restarting JUPNP as a solution to the Sonos Communications Error state.

https://community.openhab.org/t/too-much-time-before-a-sonos-thing-becomes-definitively-online/62214/16

4) There is a recommendation on the Sonos binding and a discussions about setting the TTL higher on the Sonos Devices which I tried and this did NOT resolve it.

https://community.openhab.org/t/oh-2-4-0-m7-sonos-online-and-immediately-offline-again/59253/9

I have turned on DEBUG logging on the Sonos binding and have NOT seen anything that shows a direct result of this Communication Error behaviour. I believe this issue to be a situation where both the 2 bindings (Sonos and jUPNP) are at fault partially.

Best, Jay

cweitkamp commented 4 years ago

Thanks for adding the big bounty @jaywiseman1971!

For the bounty see: https://www.bountysource.com/issues/80365719-jupnp-failing-causing-devices-to-go-offline

morph166955 commented 4 years ago

I decided to turn on jupnp debugging for a few minutes while my Sonos PlayBar was in a bad state. Every second I get this message set:

2020-02-02 13:40:34.657 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control 2020-02-02 13:40:34.657 [DEBUG] [org.jupnp.transport.spi.StreamClient] - Will not attempt request because it failed in the last 600 seconds: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control

Then, every 600 seconds I get:

2020-02-02 13:44:11.804 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control 2020-02-02 13:44:21.804 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control

2020-02-02 13:54:22.222 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control 2020-02-02 13:54:32.222 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://REDACTED:1400/DeviceProperties/Control

A CURL (with POST) to the URL gives me: s:ClientUPnPError401</s:Fault></s:Body></s:Envelope> NOTE: I get this from ALL Sonos speakers I have. I'm not sure if I'm missing something on the query to get a bad authentication.

I have verified that I have <1ms ping times to the bar (no packet loss) and they are on the same network so the Sonos TTL=1 is not an issue here.

I let this go for about an hour with the same result. Then I restarted OH2. All of my speakers immediately came online and are stable. Given this, I do NOT believe the issue to be the Sonos itself as a "random" restart of OH2 should not clear out the issue on the Sonos.

Going through the debug of the OH2 restart it looks like OH2 effectively registers as an endpoint with the Sonos speaker. My assumption (since I can't just make one fail at will) is that this registration becomes invalid at some point. Assuming that this theory is correct, at this point I would suggest that the Sonos binding be modified to detect this issue and "self heal" by restarting the registration process with that specific speaker when the HTTP request to the control URL fails to reply.

lolodomo commented 4 years ago

Are you powering off sometimes your Sonos devices?

morph166955 commented 4 years ago

They are almost never powered off. Software upgrade reboots would be the exception. I don't think my playbar has been unplugged in over a year.

morph166955 commented 4 years ago

Is there any way to adjust the 600 second wait/hold timer on JuPnP so things at least come back faster when this happens?

jaywiseman1971 commented 4 years ago

Is there anyway to exclude devices (Sonos Units) that the JuPnP binding tries to update the status on. Just another option to keep the Sonos devices online with openHAB.

Best, Jay

lolodomo commented 4 years ago

If the JUPnP library sees devices as offline then any binding using UPnP commands will not work properly for these devices. So the last suggestion makes no real sense IMHO.

lolodomo commented 4 years ago

Does this problem concern everybody or just few users ? I never noticed it on my side. At the same time, I am not checking the status of my things all the day lol If this problem concerns only few users, it might be a good idea to identify what could be the origin, Maybe something relative to an "unstable" network ?

jaywiseman1971 commented 4 years ago

good idea to identify what could be the origin

I'd be happy to provide any logs you want; please let me know what logging you want DEBUG enabled on?

My network is extremely solid with 3 Unifi AP's and and 3 Sonos Connect's all on the same subnet. I never have music drop outs on my 14 Sonos units due to connectivity/routing. As you can see; below this has been happening for quite some time now.

Everything works fine when OH is restarted from scratch then unidentified things trigger the Sonos units to go offline into communication-error status. Sometimes they will recover (usually after 10 minutes). Certain Sonos related activities trigger a domino affect like starting and stopping Sonos units across the house (this starting/stopping can either be by the App or via OH rules) seems to make it worse quicker.

https://community.openhab.org/t/sonos-communication-error-after-speaker-firmware-update/84135 https://community.openhab.org/t/oh-2-4-0-m7-sonos-online-and-immediately-offline-again/59253 https://community.openhab.org/t/too-much-time-before-a-sonos-thing-becomes-definitively-online/62214 https://community.openhab.org/t/sonos-things-offline-after-some-time/88870

Best, Jay

jaywiseman1971 commented 4 years ago

Hey @lolodomo,

Just turned on DEBUG on org.jupnp binding and here's just a small snapshot which all my Sonos units are in communication error now.

2020-02-12 12:30:41.641 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:41.642 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:41.824 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.115:1400/DeviceProperties/Control 2020-02-12 12:30:41.875 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:41.875 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:42.090 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:42.091 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:42.363 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:42.363 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:42.591 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:42.591 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:42.713 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.119:1400/DeviceProperties/Control 2020-02-12 12:30:42.715 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.178:1400/DeviceProperties/Control 2020-02-12 12:30:42.842 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:42.842 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:43.091 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:43.092 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:43.342 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:43.342 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:43.592 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:43.592 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:43.845 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:43.846 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:44.093 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:44.093 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:44.343 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:44.343 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY **2020-02-12 12:30:44.439 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.145:1400/DeviceProperties/Control** **2020-02-12 12:30:44.441 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.74:1400/DeviceProperties/Control** 2020-02-12 12:30:44.593 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:44.594 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:44.845 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:44.845 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:45.038 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.65:50594 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:45.039 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:45.094 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:45.094 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY **2020-02-12 12:30:45.290 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.103:1400/DeviceProperties/Control** 2020-02-12 12:30:45.352 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:45.356 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:45.595 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:45.595 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:45.845 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.130:49611 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:45.845 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:45.902 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.230:40768 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:45.903 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) M-SEARCH 2020-02-12 12:30:46.520 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.130:1400/DeviceProperties/Control 2020-02-12 12:30:46.902 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.230:40768 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:46.903 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) M-SEARCH **2020-02-12 12:30:47.543 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.154:1400/DeviceProperties/Control** 2020-02-12 12:30:47.823 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.189:1400/DeviceProperties/Control 2020-02-12 12:30:47.902 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.230:40768 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:47.903 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) M-SEARCH 2020-02-12 12:30:48.352 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (StreamRequestMessage) GET http://192.168.0.169:7676/smp_2_ 2020-02-12 12:30:48.353 [WARN ] [p.protocol.RetrieveRemoteDescriptors] - Device descriptor retrieval failed, no response: http://192.168.0.169:7676/smp_2_ 2020-02-12 12:30:48.433 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:48.434 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:48.435 [DEBUG] [p.protocol.RetrieveRemoteDescriptors] - Sending device descriptor retrieval message: (StreamRequestMessage) GET http://192.168.0.169:7676/smp_2_ 2020-02-12 12:30:48.435 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (StreamRequestMessage) GET http://192.168.0.169:7676/smp_2_ 2020-02-12 12:30:48.454 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:48.454 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:48.474 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:48.474 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:48.495 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:48.495 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:48.939 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.158:1400/DeviceProperties/Control 2020-02-12 12:30:48.943 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (OutgoingActionRequestMessage) POST http://192.168.0.167:1400/DeviceProperties/Control 2020-02-12 12:30:50.002 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.20:63393 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.003 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) M-SEARCH 2020-02-12 12:30:50.003 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.21:63394 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.003 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) M-SEARCH 2020-02-12 12:30:50.039 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.65:50594 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.039 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY **2020-02-12 12:30:50.044 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.98:1400/DeviceProperties/Control** 2020-02-12 12:30:50.119 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.119 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:50.120 [DEBUG] [p.protocol.RetrieveRemoteDescriptors] - Sending device descriptor retrieval message: (StreamRequestMessage) GET http://192.168.0.169:7676/smp_11_ 2020-02-12 12:30:50.121 [DEBUG] [org.jupnp.transport.Router ] - Sending via TCP unicast stream: (StreamRequestMessage) GET http://192.168.0.169:7676/smp_11_ 2020-02-12 12:30:50.139 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.140 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:50.160 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.160 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:50.180 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.169:24234 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:50.181 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY **2020-02-12 12:30:51.825 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.115:1400/DeviceProperties/Control** **2020-02-12 12:30:52.714 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.119:1400/DeviceProperties/Control** **2020-02-12 12:30:52.715 [INFO ] [org.jupnp.transport.spi.StreamClient] - Timeout of 10 seconds while waiting for HTTP request to complete, aborting: (OutgoingActionRequestMessage) POST http://192.168.0.178:1400/DeviceProperties/Control** 2020-02-12 12:30:52.918 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.103:34240 on local interface: eth0 and address: 192.168.0.230 2020-02-12 12:30:52.918 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:52.919 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.103:34240 on local interface: eth1 and address: 192.168.0.230 2020-02-12 12:30:52.919 [DEBUG] [org.jupnp.transport.Router ] - Received asynchronous message: (IncomingDatagramMessage) NOTIFY 2020-02-12 12:30:52.919 [DEBUG] [upnp.transport.spi.MulticastReceiver] - UDP datagram received from: 192.168.0.103:34240 on local interface: eth0 and address: 192.168.0.230

If I copy/paste one of the /control URL's into a browser (http://192.168.0.158:1400/DeviceProperties/Control) - here's the result which is a upnp 401 error.

`

s:Client UPnPError 401

</s:Fault> </s:Body> </s:Envelope>`

Best, Jay

morph166955 commented 4 years ago

I agree that I don't believe this to be a networking issue. Among all of the other evidence, my playbar seems to be the biggest offender to this issue. The oddity here is that the playbar is actually the device I have connected via Ethernet. With the Sonos wifi mesh, all of the other Sonos devices connect to my network through the playbar. Those devices seem to not be impacted by the playbar going offline. And before it's asked, yes I have unplugged the playbar and moved the Ethernet wire to three other speakers in the house and there is no change in the behavior.

As far as noticing when it's offline, again I notice most on my playbar. I use the Neeo remote via my OH2 server to control volume in my livingroom (and everything else for that matter). When the playbar goes offline, I can't change the volume. So 9/10 times I notice it's offline because I'm jamming on the button and nothing is happening. The playbar has actually been offline for almost 48 hours now yet my other speakers are up and chugging passing their data through the playbar.

lolodomo commented 4 years ago

What would be the best solution to check on a long period if my Sonos things are sometimes going offline ?

Edit: change of thing state should be in events.log. I am going to check that.

jaywiseman1971 commented 4 years ago

Sonos things are sometimes going offline

Please make sure your running the latest firmware for Sonos; there are some folks that have not upgraded from v8.x to 10.6.1 which this is not affecting them.

How I monitor my sonos devices is two ways - OH network connection which never goes offline and one long Thing status monitoring rule. Here's the example of mine.

` rule "Checking Physical Status of Sonos Units" when Thing "sonos:CONNECTAMP:RINCON_ABC123" changed or

 then

    var SonosThingGym = getThingStatusInfo("sonos:CONNECTAMP:RINCON_ABC123").getStatus()
        Thread::sleep(200) 

    // OFFLINE
    if (systemStarted.state != ON && gInternet.state == ON && SonosThingGym.toString() != 'ONLINE') { 

        logInfo("Sonos", "-----------------------------------------------------------------------------")
        logInfo("Sonos", 'SonosThingGym Status is ' + SonosThingGym.toString())
        logInfo("Sonos", "-----------------------------------------------------------------------------")

        if (systemStarted.state != ON && gInternet.state == ON && currHour.state != 1) { 

            val String subjectemailSonosThingGym    = "openHAB - SonosThingGym" 
            val String bodyemailSonosThingGym   = "openHAB - SonosThingGym is OFFLINE."

            sendMail(JaygMail, subjectemailSonosThingGym, bodyemailSonosThingGym)
                Thread::sleep(200)  
        }
    }   

end `

Best, Jay

lolodomo commented 4 years ago

My devices are with the firmware 10.6.1. I see no status change to OFFLINE for my 3 Sonos things in my file events.log since the beginnign of February.

lolodomo commented 4 years ago

One of my device is connected to my LAN with a RJ45 cable, the 2 others are connected to SonosNet without any physical connection (cable).

jaywiseman1971 commented 4 years ago

That's fine, I have the same setup.

What version of the jupnp and sonos binding are you using?

Best, Jay

lolodomo commented 4 years ago

I am running the official openHAB distribution 2.5.1.

lolodomo commented 4 years ago
bundle:list -s | grep onos
248 x Active x  80 x 2.5.1                   x org.openhab.binding.sonos
bundle:list -s | grep upnp
236 x Active x  80 x 2.5.2                   x org.jupnp
253 x Active x  80 x 2.5.0                   x org.openhab.core.config.discovery.upnp
260 x Active x  80 x 2.5.0                   x org.openhab.core.io.transport.upnp
jaywiseman1971 commented 4 years ago

Can you confirm what version org.jupnp is under Karaf? If its higher than 2.5.2, I will upgrade my version to your level.

jaywiseman1971 commented 4 years ago

I will try to manually upgrade mine to your same versions today.

lolodomo commented 4 years ago

At my knowledge there was no recent changes.

morph166955 commented 4 years ago

I can say this only became an issue for me when I moved from 2.4 stable up to the nightly 2.5 release when it was very early in development. my Sonos is on 10.6.1. I'm on the current OH2 2.5 release.

My belief is that the issue is with JuPnP. I can have a device offline for days and restarting JuPnP instantly fixes the issue. I would expand the question to say "how many devices in your network use JuPnP" instead of just Sonos. For example, my Samsung TVs occasionally have this issue. It's less obvious because those are offline unless the TV is turned on.

jaywiseman1971 commented 4 years ago

My belief is that the issue is with JuPnP

I have 224 Things in PaperUI; not all of them are tied to jupnp. Yes, I have the same issue with my Samsung binding also.

What is the karaf query to find out the exact number of jupnp tied things?

Best, Jay

jaywiseman1971 commented 4 years ago

248 x Active x 80 x 2.5.1 x org.openhab.binding.sonos

Can you post the URL for this version of Sonos? I can't seem to find it within https://openhab.jfrog.io/openhab/webapp/#/home OR https://ci.openhab.org/job/openHAB2.5.x-Addons/

Best, Jay

morph166955 commented 4 years ago

I'm not sure of how to see that minus turning on debugging in karaf for org.jupnp and watching for a few minutes. This does give me an idea however to test. I have no idea how much effort it would be to do this, but is it possible to add an option to the Sonos binding to NOT use JuPnP (eg specify IP address in the thing)? That may help rule out the Sonos binding as the culprit.

jaywiseman1971 commented 4 years ago

manually upgrade mine

The manual upgrade didn't work as I wanted it to BUT everything is up and running after 2 OH restarts. I'm going to leave this config setup in place to see if it happens even though the new discovery.upnp and transport.upnp aren't active to match up with @lolodomo configs. Keep in mind, I'm still on OH 2.4.

Here's the before and after of mine.

` Original: 213 │ Active │ 80 │ 2.5.2 │ org.jupnp 220 │ Active │ 80 │ 0.11.0.oh250M1 │ org.eclipse.smarthome.config.discovery.upnp 244 │ Active │ 80 │ 0.11.0.oh250M1 │ org.eclipse.smarthome.io.transport.upnp 210 │ Active │ 80 │ 2.5.0.201903252254 │ org.openhab.binding.sonos

Cleared /cache and /tmp Now: 210 │ Active │ 80 │ 2.5.0.201903252254 │ org.openhab.binding.sonos 211 │ Installed │ 80 │ 2.5.0 │ org.openhab.core.config.discovery.upnp 214 │ Active │ 80 │ 2.5.2 │ org.jupnp 219 │ Installed │ 80 │ 2.5.0 │ org.openhab.core.io.transport.upnp 251 │ Active │ 80 │ 0.10.0.oh240 │ org.eclipse.smarthome.config.discovery.upnp 252 │ Active │ 80 │ 0.10.0.oh240 │ org.eclipse.smarthome.io.transport.upnp 263 │ Active │ 80 │ 2.5.1 │ org.jupnp `

Best, Jay

lolodomo commented 4 years ago

You have not 2 instances of org.jupnp. Looks bad.

jaywiseman1971 commented 4 years ago

I have been manually uninstalling the 2.5.1 version as you can see from my original config. I'm going to give this way a try. The 2.5.1 is for OH 2.4 bindings and the 2.5.2 is for OH 2.5.x bindings. I do know if I uninstall the 2.5.1 everything still works. Just seeing what happens leaving both of them in as Active and if it helps at all with Sonos going into communication errors again.

jaywiseman1971 commented 4 years ago

leaving both of them in as Active

Still happening; attached is the screen shot of PaperUI.

Capture

Best, Jay

lolodomo commented 4 years ago

The last merge in JUPnP is dated 24 April 2019.

lolodomo commented 4 years ago

To answer to the question, I found at least this list of OH2 bindings relying on JUPnP:

morph166955 commented 4 years ago

I believe the issue was seen before that. Is there a merge somewhere just after 2.4.0 maybe in the December 2018 range?

jaywiseman1971 commented 4 years ago

OH2 bindings relying on JUPnP

My system is using these bindings. The only binding that is having issues going into communications-error state is the Sonos one out of these below.

  1. hue
  2. hueemulation
  3. lgwebos
  4. onkyo
  5. samsungtv
  6. sonos
  7. wemo - recently put a bounty up to fix this one with Hans-Jorg which its completely fixed now.

Best, Jay

lolodomo commented 4 years ago

When you open the thing in Paper UI, what is the message behind COMMUNICATION_ERROR?

lolodomo commented 4 years ago

It is probably: The UPnP device {0} is not yet registered. Is it?

jaywiseman1971 commented 4 years ago

Is it?

I restarted my OH and put it back to my original jUpNp config above; once Sonos goes offline again I will post the results.

Best, Jay

lolodomo commented 4 years ago

The binding is checking every minute (configuration setting) that the device is still registered in JUPnP. What is the general duration when the binding or more exactly JUPnP is no more detecting your device. Is it a short time, I mean few minutes?

morph166955 commented 4 years ago

Mine go on and offline at no specific interval. I'll try to dump the logs from the past few days with timestamps and post later. Also to note, mine usually only go on and off one or two at a time. It's rare to see all of them go at once. Usually when that happens JuPnP is so dead that it doesn't recover without a restart.

jaywiseman1971 commented 4 years ago

general duration

From a clean start of OH w/o using any Sonos devices - it could last up to 24 hours w/o any of them going offline.

From a clean start of OH using Sonos devices (either via the App or via OH controlling them). It seems to be around 10 min - 4 hours based on the usage. This is all dependent on using either the App or via OH controlling them. For example; here's a scenario that always takes them offline pretty quickly.

Walk down stairs and the two living room Sonos speakers start playing Spotify based on a HUE Motion trigger. Walk back up stairs to take a shower, based on a WeMo light turning in the shower - OH turns off the Sonos speakers in the living room and starts playing Spotify on an Echo in the bathroom. Turn off the WeMo light in the shower, OH stops Echo from playing Spotify and starts playing Spotify back in the living room on the two Sonos downstairs.

That Start and Stop on the Sonos in the living room (two times around 30+ minutes) starts the cause for them to go into communication-error state.

Another example (has nothing to do with OH); If I start using the app and joining/dejoin speakers that will start units going into communications errors also based on the amount of times I do it. It's NOT a guarantee that it does it the first time I do it.

Hope this helps?

Best, Jay

jaywiseman1971 commented 4 years ago

JuPnP is so dead that it doesn't recover without a restart.

Agreed; I've never gotten everything back online (WeMo, Onkyo, Samsung, etc.) after just doing a JuPnP restart. I always have to restart OH from scratch. Trying to restart JuPnP as an option - it just hangs on the restart in Karaf console.

Best, Jay

lolodomo commented 4 years ago

My question is how much time does it remain OFFLINE ?

jaywiseman1971 commented 4 years ago

much time does it remain OFFLINE

Forever when it's Communications-Error state.

If it's just "Offline" - it will come back online exactly at 10 minutes. The just "Offline" scenarios don't happen often thought; Sonos in general always goes into "Communication-Errors" state.

Best, Jay

morph166955 commented 4 years ago

Mine is the exact opposite. I can go weeks before the full crash. The offline state is multiple times a day per device.

lolodomo commented 4 years ago

This looks clearly like a problem in JUPnP and not in the Sonos binding. One of your UPnP devices (not Sonos) is probably "breaking" JUPnP.

jaywiseman1971 commented 4 years ago

@morph166955

Out of that jUpNp binding list @lolodomo provided; can you list the bindings your using? Maybe a common binding between us? SamsungTV maybe?

Best, Jay