openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.58k forks source link

[tr064] Regression: SOAP requests failing after #14468 #14575

Closed t2000 closed 9 months ago

t2000 commented 1 year ago

I Just updated my openHA snapshot from a version in january to today (11.3.2023).

The only relevant changed I found are the ones in: #14468

I get "random" failures like:

2023-03-11 15:17:31.488 [ERROR] [com.sun.xml.messaging.saaj.soap     ] - SAAJ0511: Unable to create envelope from given source
2023-03-11 15:17:31.489 [WARN ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wifi24GHzEnable, getAction=GetInfo, dataType='boolean, parameter='null'}: com.sun.xml.messaging.saaj.SOAPExceptionImpl: Unable to create envelope from given source:

By "random" I mean that they almost consistently occur when they occur once. But if i restart openHAB and they do not appear, they also won't show up later...

It also cannot read out the phonebook:

2023-03-11 15:05:21.899 [ERROR] [com.sun.xml.messaging.saaj.soap     ] - SAAJ0511: Unable to create envelope from given source
2023-03-11 15:05:21.900 [WARN ] [ding.tr064.internal.Tr064RootHandler] - Could not get phonebooks for thing tr064:fritzbox:fritzboxTR064

It looks like it is very flaky, because sometimes it works, sometimes it doesn't. I even tried to play with the new timeout parameter and increased i from the default 5 to 10 seconds, but this doesn't help unfortunately.

Here are some failing items:

Switch Network_WifiRasi24Ghz "Rasi 2.4GHz WLAN" (gNetwork) { channel="tr064:subdeviceLan:fritzboxGarageTR064:wlanDevice:wifi24GHzEnable" }

Switch Network_InetAccessTV_WZ "Inet WZ TV blockiert" (gNetwork) { channel="tr064:fritzbox:fritzboxTR064:wanBlockByIP_192_2E168_2E42_2E57" }

and their corresponding things:

Bridge tr064:fritzbox:fritzboxTR064 "Fritzbox TR064" [ host="192.168.42.1", timeout=10, refresh=60, phonebookInterval=600, user="openhab", password="XXX", missedCallDays="10", tamIndices="0", outboundCallDays="10", rejectedCallDays="10", callListDays="10", inboundCallDays="10", callDeflectionIndices="0", wanBlockIPs="192.168.42.57" ] {
    subdevice lanDevice "Fritzbox TR064 - LAN Device" [ uuid="uuid:ID", refresh=60 ]
    subdevice wanDevice "Fritzbox TR064 - WAN Device" [ uuid="uuid:ID", refresh=60 ]
    subdevice wanConDevice "Fritzbox TR064 - WAN Connection Device" [ uuid="uuid:ID", refresh=60 ]
}

//garage
Bridge tr064:fritzbox:fritzboxGarageTR064 "Fritzbox GarageTR064" [ host="10.0.6.42", refresh=60, timeout=10, phonebookInterval=0, user="openhab3", password="XXX" ] {
    subdeviceLan wlanDevice "Fritzbox Garage TR064 - WLAN Device" [ uuid="uuid:ID", refresh=60 ]
}

Edit: After stopping openHAB and then waiting 5mins before starting it and having deleted the wifi24GHzEnable item, the phone books seems to work. I assuem that because the refresh is 10mins and there was no error like above after 12mins.

But if i try to create the wifi24GHzEnable item and link it:

2023-03-11 15:37:46.586 [ERROR] [com.sun.xml.messaging.saaj.soap     ] - SAAJ0511: Unable to create envelope from given source
2023-03-11 15:37:46.587 [WARN ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wifi24GHzEnable, getAction=GetInfo, dataType='boolean, parameter='null'}: com.sun.xml.messaging.saaj.SOAPExceptionImpl: Unable to create envelope from given source: 
J-N-K commented 1 year ago

I can't reproduce that. Can you try if it also happens if you configure via UI?

J-N-K commented 1 year ago

Which FritzBox OS are you using? There is a report with the same error messages occurring since some days for a user running 3.4.2, which does not contain the linked changes.

t2000 commented 1 year ago

I am using a Fritzbox 7530 with FRITZ!OS: 7.50 which is thing tr064:fritzbox:fritzboxTR064

And I have another fritzbox in my garage which is a Fritzbox 7390 with FRITZ!OS: 06.86. That is thing tr064:fritzbox:fritzboxGarageTR064

Can you try if it also happens if you configure via UI?

Why should this make a difference?

J-N-K commented 1 year ago

Both exhibit the same behavior? I have a 6591 with 7.29 and as I said: I can't reproduce that.

UI: I just want to rule out that there is an issue with thing or channel naming. I have seen an issue in another binding where everything worked fine with the discovered thing UIDs but failed with arbitrary UIDs.

t2000 commented 1 year ago

Both exhibit the same behavior? I have a 6591 with 7.29 and as I said: I can't reproduce that.

I got

[ERROR] [com.sun.xml.messaging.saaj.soap     ] - SAAJ0511: Unable to create envelope from given source

with both. But I only have a phonebook on the 7530 with OS 7.50.

But as I mentioned before. This doesn't happen always... Seems to be timing related somehow.

UI: I just want to rule out that there is an issue with thing or channel naming. I have seen an issue in another binding where everything worked fine with the discovered thing UIDs but failed with arbitrary UIDs.

I can give that a try, or at least compare the created UIDs with the ones I specified in my config files. Will do so in the next days. But I have to add here that I had changed NOTHING in these files and the only difference from "working all the time" to "works sometimes" was the openHAB snapshot update from january towards the one from 11.3.2023. I did not change anything regarding the configs and purged tmp like I do on every update.

J-N-K commented 1 year ago

I'll check what changed recently, but it may be hard to find if it is nor clearly reproducible. You could also try to set TRACE logging to see what exactly is send and received.

There are also reports on the forum (for openHAB 3.2) with the same SOAP error. They were resolved by using the hostname instead of the IP address. I must admit that I have no idea why that happens and if it indeed is solving the issue it could also be an issue in a dependency.

t2000 commented 1 year ago

Sorry I cannot debug this. Trying to run it in my IDE but the default app.bndrun starts the runtime just fine. However when I add the tr064 binding, CPU goes to 100% on all cores and nothing happens for minutes.

@@ -62,11 +62,12 @@ feature.openhab-model-runtime-all: \
    bnd.identity;id='org.openhab.core.io.rest.ui',\
    bnd.identity;id='org.openhab.ui',\
    bnd.identity;id='org.openhab.ui.basic',\
    bnd.identity;id='org.openhab.ui.iconset.classic',\
    bnd.identity;id='org.apache.aries.jax.rs.whiteboard',\
-   bnd.identity;id='org.ops4j.pax.web.pax-web-extender-whiteboard'
+   bnd.identity;id='org.ops4j.pax.web.pax-web-extender-whiteboard',\
+   bnd.identity;id='org.openhab.binding.tr064'

 -runfw: org.eclipse.osgi

 -runee: JavaSE-17

@@ -280,6 +281,9 @@ feature.openhab-model-runtime-all: \
    org.openhab.core.ui;version='[4.0.0,4.0.1)',\
    org.openhab.core.ui.icon;version='[4.0.0,4.0.1)',\
    org.openhab.core.voice;version='[4.0.0,4.0.1)',\
    org.openhab.ui;version='[4.0.0,4.0.1)',\
    org.openhab.ui.basic;version='[4.0.0,4.0.1)',\
-   org.openhab.ui.iconset.classic;version='[4.0.0,4.0.1)'
+   org.openhab.ui.iconset.classic;version='[4.0.0,4.0.1)',\
+   org.glassfish.jaxb.runtime;version='[2.3.4,2.3.5)',\
+   org.openhab.binding.tr064;version='[4.0.0,4.0.1)',\
+   org.osgi.service.cm;version='[1.6.0,1.6.1)'

These are the changes to the app.bndrun after I press the "resolve" button.

I remember that I had reported such a bug where it takes ages for the startup of the runtime in eclipse some time ago already. I cannot find the bug ticket, but back then it was about changing the order of some bundles somewhere, but I don't know.

Edit: After ~10mins of running I get Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap space All default values, nothing changed to the JVM setup that came with OOmph

openhab-bot commented 1 year ago

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/starting-demo-app-takes-ages-10-12min-on-every-run-osgi-startup/122083/20

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.