Closed t2000 closed 10 months ago
Can you provide some more details:
From memory the reply from any http device that is a 401 with the WWW-Authenticate missing, means that the login details is WRONG. Because the camera did not reply with a fresh WWW-Authenticate header, it is because the binding gave the login details and the device is saying they are wrong. If the digest NONCE is old and no longer valid then a header should be stale: true and this does not appear to be occurring, so I tend to think this is not caused by a race condition. Possible the cpu load of the camera is hitting 100% from AI or other features having been turned on if this did not occur in the past and has just begun to occur.
I will reply more in the PR itself as I have not looked at your suggested change yet to determine if there is a negative to putting in a work around as we may possibly lose the ability to know when the user/pass is wrong and help a user fix a wrong password quickly by not knowing.
Thanks for your long reply. Let me answer your questions:
- What else is talking to the camera? Other software or perhaps a rule that talks directly to the camera?
Nothing else is talking to it, also not a rule.
- Do you have the latest firmware installed?
Probably not, as I haven't touched the cameras since ages. They always worked as they should work. Also with these chinese devices I am not really keen on installing just any update since you never know what features they turn off or change just because they feel like. Never touch a running system, especially since the cameras are not connected to the internet.
- What rights does the user/pass that your using have? is it a full admin account or one with lower rights? My best guess is that you are not using admin account and the binding is accessing a resource of the camera that requires it.
It is a dedicated user with "operator" rights, see screenshot. Unless the binding has been recently changed to query more resources in the cameras, I do not think this is an issue, as it was working fine (i.e. for months/years) until this started to occur around the end of july/beginning of august or so. (I am using the latest OH snapshot every once in a while, so roughly every 4-6 weeks or so).
- Does the camera get asked any other requests around that time?
It also sometimes happens in the middle of the night where I am not even requesting the stream of it and see 1.) I am not using any other program to talk to the camera automatically.
- What is the CPU load of the camera, is it hitting 100% at all? You can see this using the onvif device manager program.
Unfortunately I cannot check this, as the "onvif device manager" seems to be a windows program and I do not have any windows system that I could use. I am running Linux only... :(
From memory the reply from any http device that is a 401 with the WWW-Authenticate missing, means that the login details is WRONG
Well, in my case, the details are 100% correct. The username and password are stored in the *.things
file, so they are also not changed at runtime by any configuration update to the thing or so. Also, if I deactivate and activate the thing, i.e. restart the ThingHandler
the binding can connect to the camera with its credentials just fine.
Possible the cpu load of the camera is hitting 100% from AI or other features having been turned on if this did not occur in the past and has just begun to occur.
Since it "just started" to occur around end of july, we COULD blame the cpu usage. So are you saying that if the camera rans out of cpu resources, it just responds with 401 to say "leave me alone"? If so, why does it accept requests immediately afterwards, i.e. when I restart the `Thinghandler? just at that time when it happens?
However, I have NOT changed any AI features. Also my cameras are the "older" ones (DS-2CD2685FWD-IZS) which only support the basics, such as "motion detection", "some object in some zone detection" and "line cross detection", which I use. My config is as follows:
Thing ipcamera:hikvision:camFrontYard "Vorgartenkamera" [ ipAddress="IPv4", port=80, onvifPort=80, serverPort=-1, ipWhitelist="DISABLE", updateImageWhen="4", pollTime=3000, password="XXX", username="openhab", ptzContinuous=true]
Since I have updateImageWhen="4"
, i.e. "During Motion Alarm the Image channel will update every poll until Alarm stops.", the binding requests the picture every 3 seconds in case one of the camera features triggers. Do you think that is too much already? However, I have not touched these settings since ages either.
I have not looked into how netty hooks into the http requests, so far I only see that the hook where I did my change is called when a reply from the camera comes in and the url the binding tried to access at that time is the picture:
2023-08-10 16:15:24.047 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: do NOT processAuth() with '' on url='/ISAPI/Streaming/channels/101/picture' via method='GET'
determine if there is a negative to putting in a work around as we may possibly lose the ability to know when the user/pass is wrong and help a user fix a wrong password quickly by not knowing.
You are right, I also thought about this potential issue with my change, I even thought of putting a log entry on WARN
level when it occurs, before doing the restart as I do it now. If a user sees one warning after the other in their logs, they would know its a username/password issue and if it occurs only everyone once in a while (for me, every couple of days), they could ignore the warning as it is "only a warning" and no error, i.e. it heals by itself.
Let me know if such a warning log would help in these cases or you have a better "workaround".
Some things to note:
You can't seem to pin point it to any PR as nothing changed around that time. I doubt it Is a binding change, to confirm you could put an old Jar into the addons folder and see if it still occurs.
You said the firmware has not changed so I suspect it is due to changing a setting in the camera that has changed its cpu load. The cpu load is reported as an onvif event and it's possible the logs in trace will show the values. I did not want to add a channel for this event but other automation platforms do and report that cameras play up when they are loaded up. Sadly they make the firmware capable of doing things that the cpu is not capable of.
The preferred way to log, is to not log if the online/thing status changes as this change makes it into the event. Log. I'll find some time soon to review the PR and best to put the comments on that there.
My cameras are not doing this and I am not aware of anyone's else's doing this. I think it is worth looking at this some more to find the root cause which is why I am posting here with my thoughts it may be cpu related on the camera. MY two hik cameras are running fine for over 5 weeks at a time before I do a milestone update and reboot the system.
It would be interesting to add a warn log when the onvif events state the cpu load is 99% or higher. Run that for a while on your system as it may be a feature worth adding to the binding.
I doubt this is netty, you can change the netty version with dropping jar files into the addons folder or using commands in karaf console. I posted on forum the commands to do it once.
If the root cause is not found then it's possible this will occur right when you need a snapshot or something from the camera when an event is occurring, so fixing the root cause would be preferred to just reconnecting.
Thanks to the rain outside I just found some time to enable some logging. I activated TRACE
logs for the OnvifConnection
class.
Just after I changed it in the log4j.xml file, I got a 401 with not WWW-Authenticate again. However, there is unfortunately no call logged just before it happend. I only see the onvif calls that happen during the reconnect which I trigger in my version of the binding.
In these reconnect logs I do not see any CPU usage unfortunately :( I have attached the logs if you would like to have a look.
You are getting a WARN on this URL /ISAPI/System/IO/inputs/1/status
See this thread here: https://community.openhab.org/t/ipcamera-binding-doing-erroneous-isapi-request-to-hikvision-camera/147345/3
This thread is about wrong I/Os reported by the camera.
So what would be the correct solution to handle such a faulty report of capabilities?
Should we introduce an advanced thing configuration that can be manually set, indicating how many I/Os the camera has? So one can decide to override whatever is reported by the camera with the configuration. Would that make sense and be the "correct" solution, i.e. instead of restarting all the time?
@Skinah Do you have any opinion on this?
Since you can compile the binding with changes in it, try stopping the binding from sending that API call /ISAPI/System/IO/inputs/1/status
If your 401 errors go away and the problem is solved then we should leave the code alone and fix the issue of the binding asking for IO status when the camera has no IO. That thread has the API call to ask the camera if it has IO and examples of what a few cameras will give in response.
This is causing issues for other people, so I suspect this may be the root cause and explains why I do not see the issue with my cameras that do have IO.
Thanks for your feedback, I just did:
Since you can compile the binding with changes in it, try stopping the binding from sending that API call /ISAPI/System/IO/inputs/1/status
And now I will watch my logs and report if the problem goes away. If it does, I can also provide a draft for an PR with the described solution you mentioned.
@Skinah After commenting out the request, the camera did not go offline with the 401 again.
I have adjusted https://github.com/openhab/openhab-addons/pull/15613
in a way that it now contains an advanced flag for the hikvision thing named hasExternalAlarmInputs
which defaults to false
since I think most people won't use them I guess. And if they do, they can simply enable them with that setting. Please have a look as it solves the problem also for others.
back to the beginning...
My camera supports I/O alarms and the binding can also request the status:
16:24:00.899 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:00.922 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:08.901 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:08.919 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:16.901 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:16.916 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:24.903 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:24.920 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:32.905 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:32.919 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:40.903 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:40.914 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
The first 401 is handled by your processAuth
here: https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.ipcamera/src/main/java/org/openhab/binding/ipcamera/internal/MyNettyAuthHandler.java#L161
However, sometimes the camera does not send the WWW-Authenticate
as I mentioned when I opened the ticket.
So requesting whether or not the camera supports I/Os doesn't help with the issue.
IMHO the best solution is to change the configerror
to cameraCommunicationError
here https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.ipcamera/src/main/java/org/openhab/binding/ipcamera/internal/MyNettyAuthHandler.java#L163
because its a bug in the firmware of these cameras and a restart always helps in such cases. Kai also suggested such a workaround as thats the only thing we can do with flaky hardware unfortunately: https://github.com/openhab/openhab-addons/pull/15613#pullrequestreview-1661192800
I have reverted my commit to the retry. Feel free to take the solution or close my MR. I will live with that solution for now.
Thank you for helping me understand. I would support either the auto retry as your suggesting, or a config added for only hikvision thing types as you did before, except making it on as default and you can disable it for just your camera. This way the default does not cause a breaking change and anyone with issues can disable it. Once again sorry if this is painful, but this is why we encourage people to discuss first. I am very busy so sorry if I did not re read all info and I will try to make the time ASAP to try and get this merged in a timely way. Thank you for your work doing this.
Thanks for your reply. No worries, I understand that we are all doing this in our spare times.
I cannot go back to the "hikvision flag" as this will effectively turn off the handling of external I/O events for cameras with that flag not set. As I mentioned, my camera DOES support external I/O alarms and if I am going to decide to use it at some point in the future, I would like to see that work.
The only "problem" I see with my change is that if you put wrong credentials into your thing, but even then you will see in the event logs that your thing goes ONLINE/OFFLINE/ONLINE etc. in a loop and you still get the message that your credentials might be wrong...
Long story short: Every few days/weeks the camera decides to respond ONCE with a 401, like a rare hicup and afterwards all is good again. Hence the retry for such cases for all cameras.
Thanks for the PR, have approved it to be merged. I do see a minor issue and that is that hikvision will lock the camera when the wrong password is tried multiple times, this is called the 'illegal login lockout'. So people that do use the wrong password will need to turn the camera off and back on again to reset this feature. From memory this may even get tripped by the binding before this change was made as the binding may be making multiple requests in parallel without waiting for the first reply to be made. Just making this note in case someone searches the issues as they are getting the wrong password error and find this info. I will be looking at making the change to remove checking the IO if the camera does not have the feature and have started working on this already whilst it is fresh in my head. Thanks once again for your work on this.
Somehow, after a while (sometimes around 24h or more) my ipcamera things go offline with:
I have NOT changed anything on my cameras (i.e. no firmware update or so). The issue started to appear with an openHAB snapshot from around may 2023. I tried to pinpoint it to a commit, but there are no real changes in the binding around that time.
It could be due to an update of the netty library and a different handling of requests in it.
I added some more logs where the things goes offline with that error and this is what response the binding got when it ran into the error:
It happens here in the code: https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.ipcamera/src/main/java/org/openhab/binding/ipcamera/internal/MyNettyAuthHandler.java#L163
So
authenticate
is empty and as i am printing out all headers, we can see that there is no header containingWWW-Authenticate
.The
processAuth
looks like it could handle an emptyauthenticate
parameter, but as the logic explicitly deals with that case and runs into the error instead, I wonder why that is ;)If I deactivate and reactivate my thing, it comes back online and works for hours/days, before it runs into the issue again. So I am wondering whether a retry should be implemented if a camera (falsely?) responds with 401 out of a sudden to some request...