openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.58k forks source link

JuPnP failing - causing devices to go offline #5892

Closed morph166955 closed 3 years ago

morph166955 commented 5 years ago

Refer to https://community.openhab.org/t/too-much-time-before-a-sonos-thing-becomes-definitively-online/62214

For no observed reason, JuPnP seems to fail randomly and cause things to go offline. Sonos speakers seem to be the biggest culprit. Once this condition is reached, all things that rely on JuPnP seem to fall offline quickly. I've seen this happen between 3 days and 2 weeks, no specific time length causes the failure.

This can be mitigated currently by having a rule monitor getThingStatusInfo(mything).getStatus().toString() and when it goes offline it executes "executeCommandLine(”/usr/bin/ssh -p 8101 -i /home/openhab/karaf_keys/openhab.id_rsa openhab@localhost bundle:restart org.jupnp", 120000)"

morph166955 commented 4 years ago

Yeah that's about it. Just SamsungTV and Sonos here. I had wemo but removed them about a month ago. Odd question, do you have a samsung fridge or anything else that broadcasts? I notice my fridge IP show up in JuPnP on occasion.

jaywiseman1971 commented 4 years ago

samsung fridge or anything else that broadcasts

No, only 8 Samsung TV's and they are using 3 different communication methods (native, socket and secure socket) in the Samsung binding.

Best, Jay

lolodomo commented 4 years ago

In this case, please power off (not standby) your Samsung TVs and see if it fixes the problem.

Note that you can use on a Windows PC the app DeviceSpy to identity all the UPnP devices on your network (weird, I see my Sonos devices are no more detected by this app !).

jaywiseman1971 commented 4 years ago

My TV's being On or Off doesn't affect it; the reason I know this is we listen to music 90% of the time vs. watching TV. It could be days before we turn the TV on - along with this happening sometimes over night when nothing is on in the house.

I could disable all the Samsung TV Things as an option though.

Best, Jay

lolodomo commented 4 years ago

If your TVs are in standby mode, the network stuff probably continue. The idea is to power them off fully. Disabling the openHAB things will have no effect IMHO.

lolodomo commented 4 years ago

What I suspect is a problem while JUPnP is directly exchanging with one of your UPnP device (Samsung TV as good candidatess). Due to a bug in JUPnP, data are then corrupted and your Sonos devices then disappeared for JUPnP.

morph166955 commented 4 years ago

I can eliminate the SamsungTVs as the culprit. My TVs fully shut the network port off when powered down. This was a huge issue for me because I couldn't send a signal to power on. I had to use IR blasters for power on. Same also here, I can go days with the TVs off.

lolodomo commented 4 years ago

Please check if you don't have unidentified UPnP devices (Samsung fridge was a good example).

lolodomo commented 4 years ago

There were not a lot of changes in JUPnP since December 2018:

image

The most important was the one merged the third of December 2018.

lolodomo commented 4 years ago

My Sonos devices finally appeared in Device Spy. So let it run a certain time. In my case, Device Spy is identifying my Hue bridge and my 3 Sonos.

jaywiseman1971 commented 4 years ago

Device Spy

Can you post the URL for this software for a Windows platform?

This one? https://www.meshcommander.com/upnptools

Best, Jay

lolodomo commented 4 years ago

It has moved apparently but Google gives me this new URL: https://www.meshcommander.com/upnptools

jaywiseman1971 commented 4 years ago

I'm currently using the Sonos app on the PC; here's the only upnp devices my workstation is seeing right now.

Capture

Best, Jay

lolodomo commented 4 years ago

You have a very big setup including even several Sonos Boost. Your Samsung TVs are not found.

Your hue bridge is not found too. Do you own a hue bridge ? If not, you may try to stop the hue and hueemulation bindings to see if it helps.

We cannot exclude a bug in one of the bindings using the JUPnP library (through the openHAB core IO transport UPnP bundle).

lolodomo commented 4 years ago

In such a big Sonos setup, each Sonos Boost is defining a SonosNet network ? Your Sonos are then dispatched in different SonosNet networks ?

jaywiseman1971 commented 4 years ago

I have 2 Hue Bridges and 8 Samsung TVs and the Device Spy hasn't seen the traffic. I have a flat network with NO VLan's. Each SonosNet is on the same network. I have a Boost on each floor. My setup is a combo of hardwired and SonosNet. I also have over 20+ WeMo devices it's not seeing either.

My guess why it's not seeing the HUE or Samsung stuff is it's because my entire network is on 6 switches and the PC doesn't need to see/access it?

Yes, I have 214 Things in PaperUI, very large home automation setup.

My next test I'm doing, is to disabling the Samsung TV Things from a clean startup of OH to see if that has any affect on anything.

I agree about not excluding openHAB core IO transport UPnP bundle

Best, Jay

lolodomo commented 4 years ago

You have clearly a very big and very uncommon setup. I imagine you encounter a problem due to this amount of UPnP devices.

If DeviceSpy don't see some of your UPnP devices, I can imagine JUPnP is probably not seeing them too.

lolodomo commented 4 years ago

Running DeviceSpy on your openHAB server wouldl be interesting.

jaywiseman1971 commented 4 years ago

JUPnP is probably not seeing them

Everything is working with OH for the last 2 years starting with OH 2.3 when I set it up. Sonos is the only thing NOT working and it just started not working a little over a year ago. There are some folks that think it's the combo of Sonos firmware changes and jUpNp causing this.

DeviceSpy on your openHAB server

I would love to but I'm running OH on a Synology NAS in HA mode and I'm not really sure how to take the Linux package of DeviceSpy and get it to work on a Synology. I am NOT running Docker; just the basic Synology setup.

You would think if OH sees the devices on a clean boot-up and they stay online until actions occur; that it would be something to do with Sonos and jUpNp bindings (part fault of each).

Best, Jay

lolodomo commented 4 years ago

Running OH on a Synology NAS in another not very common thing lol

Checking very quickly the code, I see that few other bindings like Sonos are checking if the device is registered in JUPnP. That is the case of Wemo and SamsungTV. A difference is probably that the thing is not set to OFFLINE when the device is no more registered while in Sonos binding the thing status is set to OFFLINE in this case. I could avoid this status set but IMHO this is not a good solution because when the device is no more registered in JUPnP the binding will no more work properly (all commands will fail and no data will be received by the binding). The real question is why JUPnP unregisters the UPnP devices. I see two reasons, either due to a bug in the library or because your NAS is really no more able to exchange with these devices.

lolodomo commented 4 years ago

This will require someone knowing the JUPnP library more than me that could add few logs at the right place in the library to debug in your special environment.

morph166955 commented 4 years ago

Is there a Linux version of DeviceSpy you are aware of? My OH2 instance is running on Ubuntu.

Stepping back, I think there are 3 failure states here we need to be understanding

1) Single device failure, restored after 10 minute window 2) Single device failure, never comes back 3) All device failure, JuPnP hard failure.

Failure seems proportionate to the number of devices. This makes me believe it is a memory issue somewhere.

As far as my home network, I see the following 6x Roku devices (not in OH2) 6x Sonos devices (all in OH2. To note, I have 7 in total configured in OH2, one is physically powered off right now) 1x Denon receiver (thing configured in OH2, but not using uPnP) 1x Synology disk station (Interesting that we both have a synology) 3x HD Homerun TV tuners (not in OH2)

Also to note, I turned some of the Samsung TVs on and off and I see them only when they are powered up.

When going through the debugs it looks like the query has just failed. Is there a way to force JuPnP to retry the URL a few times before marking it offline? Specifically the http://[sonos-ip]:1400/DeviceProperties/Control URL.

jaywiseman1971 commented 4 years ago

NAS is really no more able to exchange with these devices

Did you scrape my jupnp log in this thread above and look at the entries? It's showing the bnding is talking to the Sonos devices but the devices are returning a jUpNp 401 error when they go into Communications-Error state. XML is coming back from them to jUpNp binding. You'll see in the log it waits 10 ms for a response that is valid.

<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <s:Body> <s:Fault> <faultcode>s:Client</faultcode> <faultstring>UPnPError</faultstring> <detail> <UPnPError xmlns="urn:schemas-upnp-org:control-1-0"> <errorCode>401</errorCode> </UPnPError> </detail> </s:Fault> </s:Body> </s:Envelope>

Best, Jay

morph166955 commented 4 years ago

Interesting thing to note after looking at DeviceSpy for a bit now. My playbar has been offline for DAYS (I've left it this way for troubleshooting). DeviceSpy sees my playbar and seems to be getting info from it.

jaywiseman1971 commented 4 years ago

memory issue

I have up'd my memory (2 GB to 8 GB) on my Synology's over a year ago and tweak the OH memory settings for it. Its seem pretty solid now. My Synology runs around 35% RAM and 40% CPU most of the time.

Here's the lines for tweaking the memory settings in OH which I got off the forums from another foundation member.

runtime.cfg

org.eclipse.smarthome.threadpool:thingHandler=50
org.eclipse.smarthome.threadpool:discovery=20
org.eclipse.smarthome.threadpool:safeCall=50
org.eclipse.smarthome.threadpool:ruleEngine=50

org.eclipse.smarthome.webclient:minThreadsShared=10
org.eclipse.smarthome.webclient:maxThreadsShared=60
org.eclipse.smarthome.webclient:minThreadsCustom=10
org.eclipse.smarthome.webclient:maxThreadsCustom=30

Best, Jay

morph166955 commented 4 years ago

I have similar configs on my side. Those won't impact the memory though. Those configs deal with the multithreading capabilities of the CPU. My OH2 instance is a VM with 8GB RAM allocated to it.

As a test, I'm firing up a new OH2 instance and ONLY putting the Sonos binding in it to see if it has this problem.

jaywiseman1971 commented 4 years ago

on my side

Are you running Unifi Controller & AP's also?

Best, Jay

morph166955 commented 4 years ago

I was referring to the runtime.cfg configurations you posted. My home network is all Cisco.

lolodomo commented 4 years ago

Did you scrape my jupnp log in this thread above and look at the entries? It's showing the bnding is talking to the Sonos devices but the devices are returning a jUpNp 401 error when they go into Communications-Error state. XML is coming back from them to jUpNp binding. You'll see in the log it waits 10 ms for a response that is valid.

My understanding is that your OH server through the JUPnP library is sending a command to your Sono device through a HTTP POST request. This request is failing in timeout after 10 seconds. That means this Sonos could be just unreachable on your network from the OH server.

jaywiseman1971 commented 4 years ago

could be just unreachable on your network from the OH server

If it was unreachable? then wouldn't it not respond vs. returning a 401 error? Not sure what side of the communication is returning the XML piece.

Keep in mind, all my Sonos devices are being monitored by the network binding also which they are never unreachable.

I first tried port 1400 to see if that helped with them going into communications-error then I changed it to 1443 which it didn't change anything. Both ports are used by Sonos . . .

Thing network:servicedevice:sonosbasementleft "Basement Left Speaker" [ hostname="192.168.0.189", port=1443, retry=3, timeout=5000, refreshInterval=60000 ] Thing network:servicedevice:sonosbasementright "Basement Right Speaker" [ hostname="192.168.0.154", port=1443, retry=3, timeout=5000, refreshInterval=60005 ] Thing network:servicedevice:sonosbedroomleft "Bedroom Left Speaker" [ hostname="192.168.0.74", port=1443, retry=3, timeout=5000, refreshInterval=60010 ] Thing network:servicedevice:sonosbedroomright "Bedroom Right Speaker" [ hostname="192.168.0.98", port=1443, retry=3, timeout=5000, refreshInterval=60015 ] Thing network:servicedevice:sonosgym "Gym AMP Speakers" [ hostname="192.168.0.158", port=1443, retry=3, timeout=5000, refreshInterval=60020 ] Thing network:servicedevice:sonosinground "In Ground AMP Speakers" [ hostname="192.168.0.163", port=1443, retry=3, timeout=5000, refreshInterval=60025 ] Thing network:servicedevice:sonosjays "Jays AMP Speakers" [ hostname="192.168.0.178", port=1443, retry=3, timeout=5000, refreshInterval=60030 ] Thing network:servicedevice:sonoskitchen "Kitchen Speaker" [ hostname="192.168.0.83", port=1443, retry=3, timeout=5000, refreshInterval=60035 ] Thing network:servicedevice:sonoslivingroomleft "Living Room Left Speaker" [ hostname="192.168.0.103", port=1443, retry=3, timeout=5000, refreshInterval=60040 ] Thing network:servicedevice:sonoslivingroomright "Living Room Right Speaker" [ hostname="192.168.0.130", port=1443, retry=3, timeout=5000, refreshInterval=60045 ] Thing network:servicedevice:sonosloftleft "Loft Left Speaker" [ hostname="192.168.0.145", port=1443, retry=3, timeout=5000, refreshInterval=60050 ] Thing network:servicedevice:sonosloftright "Loft Right Speaker" [ hostname="192.168.0.115", port=1443, retry=3, timeout=5000, refreshInterval=60055 ] Thing network:servicedevice:sonosonwall "On Wall AMP Speakers" [ hostname="192.168.0.190", port=1443, retry=3, timeout=5000, refreshInterval=60060 ] Thing network:servicedevice:sonosryan "Ryan Speaker" [ hostname="192.168.0.167", port=1443, retry=3, timeout=5000, refreshInterval=60065 ] Thing network:servicedevice:sonostricia "Tricia Speaker" [ hostname="192.168.0.119", port=1443, retry=3, timeout=5000, refreshInterval=60070 ] Thing network:servicedevice:sonosboostbasement "Boost Basement" [ hostname="192.168.0.61", port=1443, retry=3, timeout=5000, refreshInterval=60075 ] Thing network:servicedevice:sonosboostledge "Boost Loft Ledge" [ hostname="192.168.0.126", port=1443, retry=3, timeout=5000, refreshInterval=60080 ] Thing network:servicedevice:sonosboostoffice "Boost Office" [ hostname="192.168.0.62", port=1443, retry=3, timeout=5000, refreshInterval=60085 ]

Best, Jay

lolodomo commented 4 years ago

Did you try not monitoring these devices with the network binding?

jaywiseman1971 commented 4 years ago

Did you try not monitoring these devices with the network binding?

Yes, the first 9 months of of 2019 there was no monitoring done. Woke up one morning thinking about keeping a monitor going would help - it didn't.

Best, Jay

lolodomo commented 4 years ago

Your manual HTTP request was not run from the same server (NAS) ?

jaywiseman1971 commented 4 years ago

Your manual HTTP request

Your correct; it was run from my PC - so that error wasn't valid for the NAS connection. Good catch!

Best, Jay

morph166955 commented 4 years ago

I have also had my sonos devices monitored through the network binding. They are solid. It's rare for me to even miss one ping.

morph166955 commented 4 years ago

I just restarted my primary OH2 instance. The speakers started flapping within 4 minutes. They have been flapping randomly now for about 15 minutes. Interestingly enough, the second OH2 instance has been 100% stable for the last hour. I would expect a device issue to impact both OH2 instances. As a note, both OH2 instances are running on the exact same hypervisor.

morph166955 commented 4 years ago

@jaywiseman1971 Please run a test for me. Uninstall the wemo binding completely and see if it stabilizes everything. I went back through my configs and noticed I didn't actually remove the binding from my system when I removed the things/items. My things have been completely stable since removing the wemo binding.

lolodomo commented 4 years ago

Without any things, the binding will just do nothing. But why not testing.

morph166955 commented 4 years ago

Won't it be still searching for items to add to the inbox?

lolodomo commented 4 years ago

Yes you're right.

morph166955 commented 4 years ago

Nevermind, they just started failing

jaywiseman1971 commented 4 years ago

Nevermind, they just started failing

The test I'm running is I have all my Samsung Things disabled after a clean startup. I have defined things and jupnp discovered things for Samsung.

I'll let you know what happens.

lolodomo commented 4 years ago

@morph166955 : so your only UPnP devices are a Samsung TV and few Sonos ? Your only running bindings using JUPnP are sonos and samsungtv ?

And you run OH in a VM.

If this is the case, could you uninstall the samsungtv binding to see if it changes something ?

morph166955 commented 4 years ago

Correct. I'm going to see if I can make this start failing on the second OH2 VM I rolled so that I can break it without breaking my primary instance.

jaywiseman1971 commented 4 years ago

Here's an example of the 10 min situation to recover:

-----Original Message----- From: openHAB openHAB@abc.com Sent: Sunday, February 16, 2020 3:29 PM To: Jay Subject: openHAB - SonosThingJay

openHAB - SonosThingJay is OFFLINE.


eMail showing everything is back online below:


-----Original Message----- From: openHAB openHAB@abc.com Sent: Sunday, February 16, 2020 3:38 PM To: Jay Subject: openHAB - Sonos Report


Sonos Units OFFLINE Now


jaywiseman1971 commented 4 years ago

my Samsung Things disabled after a clean startup

So far so good since I've done this. My normal morning OH routine that flips the playing of the Sonos from living room to the bathroom back to the living room worked perfectly today. I'll keep you posted . . .

Best, Jay

lolodomo commented 4 years ago

Looking at the code, I just discovered that few bindings directly acceed the JUPnP bundle without going through the OH core framework: wemo, hueemulation, lgwebos and samsungtv. Some are doing that only to trigger a new UPnP search: wemo and lgwebos. But 2 bindings are directly using the service registry from the library: hueemulation and samsungtv. This is very probably not an expected usage. Normally JUPnP library should be used only by the OH core framework (IO transport UPnP bundle). And to be honest, after a quick look, I don't understand what is doing the samsungtv binding...

morph166955 commented 4 years ago

Perhaps someone should tag the SamsungTV binding owner to get their attention (I would but I'm not sure who that is).

jaywiseman1971 commented 4 years ago

SamsungTV binding owner

I'm going to wait another 24 hours and see if everything continues to work w/Sonos. If it does, I'm pretty sure I know the person to loop in is and I'll tag them to this issue.

If the Samsung binding is the culprit; I plan to roll back to either 2.4m1 or 2.3 Samsung binding as the next step on my system.

Best, Jay

jaywiseman1971 commented 4 years ago

SamsungTV Code Owner = https://github.com/Cossey

From: Stewart Cossey notifications@github.com Sent: Thursday, December 19, 2019 4:24 AM To: openhab/openhab2-addons openhab2-addons@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [openhab/openhab2-addons] [samsungtv] Binding does not work properly with UE55H6290 (#1216)

Whilst the updates now make the binding more useful for 2017/2018/2019 models there's still some issues with getting Power On/Off working reliably which I hope to improve in the next 3 weeks. Overall Samsung seemed to have dropped the ball on the IP control side of things with the later model TVs 👎 Remote Control protocol had to be reversed engineered by a group of people working on the samsungctl app, reduced features - no way to send Notifications to the TV, no way to list the TV channels. Mind you this may still actually be possible, but its all undocumented so it'll only be by shear luck if someone discovers this functionality. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.