openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.88k stars 3.59k forks source link

[ipcamera] CONFIGURATION_ERROR Camera gave no WWW-Authenticate: Your login details must be wrong. #15413

Closed t2000 closed 10 months ago

t2000 commented 1 year ago

Somehow, after a while (sometimes around 24h or more) my ipcamera things go offline with:

CONFIGURATION_ERROR
Camera gave no WWW-Authenticate: Your login details must be wrong.

I have NOT changed anything on my cameras (i.e. no firmware update or so). The issue started to appear with an openHAB snapshot from around may 2023. I tried to pinpoint it to a commit, but there are no real changes in the binding around that time.

It could be due to an update of the netty library and a different handling of requests in it.

I added some more logs where the things goes offline with that error and this is what response the binding got when it ran into the error:

2023-08-10 16:15:24.047 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Response from camera: DefaultHttpResponse(decodeResult: failure(java.lang.IllegalArgumentException: Validation failed for header 'WWW-Authenticate'), version: HTTP/1.1)
HTTP/1.1 401 Unauthorized
Date: Thu, 10 Aug 2023 16:15:23 GMT
Server: webserver
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content-Length: 178
Content-Type: text/html
Connection: close
2023-08-10 16:15:24.047 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: do NOT processAuth() with '' on url='/ISAPI/Streaming/channels/101/picture' via method='GET'
2023-08-10 16:15:24.047 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'Date' with value 'Thu, 10 Aug 2023 16:15:23 GMT'
2023-08-10 16:15:24.047 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'Server' with value 'webserver'
2023-08-10 16:15:24.048 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'X-Content-Type-Options' with value 'nosniff'
2023-08-10 16:15:24.048 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'X-Frame-Options' with value 'SAMEORIGIN'
2023-08-10 16:15:24.048 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'X-XSS-Protection' with value '1; mode=block'
2023-08-10 16:15:24.048 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'Content-Length' with value '178'
2023-08-10 16:15:24.048 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'Content-Type' with value 'text/html'
2023-08-10 16:15:24.048 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: Got header: 'Connection' with value 'close'

It happens here in the code: https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.ipcamera/src/main/java/org/openhab/binding/ipcamera/internal/MyNettyAuthHandler.java#L163

So authenticate is empty and as i am printing out all headers, we can see that there is no header containing WWW-Authenticate.

The processAuth looks like it could handle an empty authenticate parameter, but as the logic explicitly deals with that case and runs into the error instead, I wonder why that is ;)

If I deactivate and reactivate my thing, it comes back online and works for hours/days, before it runs into the issue again. So I am wondering whether a retry should be implemented if a camera (falsely?) responds with 401 out of a sudden to some request...

Skinah commented 1 year ago

Can you provide some more details:

  1. What else is talking to the camera? Other software or perhaps a rule that talks directly to the camera?
  2. Do you have the latest firmware installed? This post has the download link for all firmware if you know the series that your camera comes from. https://community.openhab.org/t/hikvision-cameras-need-urgent-firmware-update/126771
  3. What rights does the user/pass that your using have? is it a full admin account or one with lower rights? My best guess is that you are not using admin account and the binding is accessing a resource of the camera that requires it.
  4. Does the camera get asked any other requests around that time?
  5. What is the CPU load of the camera, is it hitting 100% at all? You can see this using the onvif device manager program.

From memory the reply from any http device that is a 401 with the WWW-Authenticate missing, means that the login details is WRONG. Because the camera did not reply with a fresh WWW-Authenticate header, it is because the binding gave the login details and the device is saying they are wrong. If the digest NONCE is old and no longer valid then a header should be stale: true and this does not appear to be occurring, so I tend to think this is not caused by a race condition. Possible the cpu load of the camera is hitting 100% from AI or other features having been turned on if this did not occur in the past and has just begun to occur.

I will reply more in the PR itself as I have not looked at your suggested change yet to determine if there is a negative to putting in a work around as we may possibly lose the ability to know when the user/pass is wrong and help a user fix a wrong password quickly by not knowing.

t2000 commented 1 year ago

Thanks for your long reply. Let me answer your questions:

  1. What else is talking to the camera? Other software or perhaps a rule that talks directly to the camera?

Nothing else is talking to it, also not a rule.

  1. Do you have the latest firmware installed?

Probably not, as I haven't touched the cameras since ages. They always worked as they should work. Also with these chinese devices I am not really keen on installing just any update since you never know what features they turn off or change just because they feel like. Never touch a running system, especially since the cameras are not connected to the internet.

  1. What rights does the user/pass that your using have? is it a full admin account or one with lower rights? My best guess is that you are not using admin account and the binding is accessing a resource of the camera that requires it.

It is a dedicated user with "operator" rights, see screenshot. Unless the binding has been recently changed to query more resources in the cameras, I do not think this is an issue, as it was working fine (i.e. for months/years) until this started to occur around the end of july/beginning of august or so. (I am using the latest OH snapshot every once in a while, so roughly every 4-6 weeks or so).

  1. Does the camera get asked any other requests around that time?

It also sometimes happens in the middle of the night where I am not even requesting the stream of it and see 1.) I am not using any other program to talk to the camera automatically.

  1. What is the CPU load of the camera, is it hitting 100% at all? You can see this using the onvif device manager program.

Unfortunately I cannot check this, as the "onvif device manager" seems to be a windows program and I do not have any windows system that I could use. I am running Linux only... :(

From memory the reply from any http device that is a 401 with the WWW-Authenticate missing, means that the login details is WRONG

Well, in my case, the details are 100% correct. The username and password are stored in the *.things file, so they are also not changed at runtime by any configuration update to the thing or so. Also, if I deactivate and activate the thing, i.e. restart the ThingHandler the binding can connect to the camera with its credentials just fine.

Possible the cpu load of the camera is hitting 100% from AI or other features having been turned on if this did not occur in the past and has just begun to occur.

Since it "just started" to occur around end of july, we COULD blame the cpu usage. So are you saying that if the camera rans out of cpu resources, it just responds with 401 to say "leave me alone"? If so, why does it accept requests immediately afterwards, i.e. when I restart the `Thinghandler? just at that time when it happens?

However, I have NOT changed any AI features. Also my cameras are the "older" ones (DS-2CD2685FWD-IZS) which only support the basics, such as "motion detection", "some object in some zone detection" and "line cross detection", which I use. My config is as follows:

Thing ipcamera:hikvision:camFrontYard "Vorgartenkamera" [ ipAddress="IPv4", port=80, onvifPort=80, serverPort=-1, ipWhitelist="DISABLE", updateImageWhen="4", pollTime=3000, password="XXX", username="openhab", ptzContinuous=true]

Since I have updateImageWhen="4", i.e. "During Motion Alarm the Image channel will update every poll until Alarm stops.", the binding requests the picture every 3 seconds in case one of the camera features triggers. Do you think that is too much already? However, I have not touched these settings since ages either.

I have not looked into how netty hooks into the http requests, so far I only see that the hook where I did my change is called when a reply from the camera comes in and the url the binding tried to access at that time is the picture:

2023-08-10 16:15:24.047 [WARN ] [ipcamera.internal.MyNettyAuthHandler] - Stefan: do NOT processAuth() with '' on url='/ISAPI/Streaming/channels/101/picture' via method='GET'

determine if there is a negative to putting in a work around as we may possibly lose the ability to know when the user/pass is wrong and help a user fix a wrong password quickly by not knowing.

You are right, I also thought about this potential issue with my change, I even thought of putting a log entry on WARN level when it occurs, before doing the restart as I do it now. If a user sees one warning after the other in their logs, they would know its a username/password issue and if it occurs only everyone once in a while (for me, every couple of days), they could ignore the warning as it is "only a warning" and no error, i.e. it heals by itself.

Let me know if such a warning log would help in these cases or you have a better "workaround".

user_rights.zip

Skinah commented 1 year ago

Some things to note:

You can't seem to pin point it to any PR as nothing changed around that time. I doubt it Is a binding change, to confirm you could put an old Jar into the addons folder and see if it still occurs.

You said the firmware has not changed so I suspect it is due to changing a setting in the camera that has changed its cpu load. The cpu load is reported as an onvif event and it's possible the logs in trace will show the values. I did not want to add a channel for this event but other automation platforms do and report that cameras play up when they are loaded up. Sadly they make the firmware capable of doing things that the cpu is not capable of.

The preferred way to log, is to not log if the online/thing status changes as this change makes it into the event. Log. I'll find some time soon to review the PR and best to put the comments on that there.

My cameras are not doing this and I am not aware of anyone's else's doing this. I think it is worth looking at this some more to find the root cause which is why I am posting here with my thoughts it may be cpu related on the camera. MY two hik cameras are running fine for over 5 weeks at a time before I do a milestone update and reboot the system.

It would be interesting to add a warn log when the onvif events state the cpu load is 99% or higher. Run that for a while on your system as it may be a feature worth adding to the binding.

I doubt this is netty, you can change the netty version with dropping jar files into the addons folder or using commands in karaf console. I posted on forum the commands to do it once.

If the root cause is not found then it's possible this will occur right when you need a snapshot or something from the camera when an event is occurring, so fixing the root cause would be preferred to just reconnecting.

t2000 commented 1 year ago

Thanks to the rain outside I just found some time to enable some logging. I activated TRACE logs for the OnvifConnection class.

Just after I changed it in the log4j.xml file, I got a 401 with not WWW-Authenticate again. However, there is unfortunately no call logged just before it happend. I only see the onvif calls that happen during the reconnect which I trigger in my version of the binding.

In these reconnect logs I do not see any CPU usage unfortunately :( I have attached the logs if you would like to have a look.

openhab-camera-trace.log

Skinah commented 1 year ago

You are getting a WARN on this URL /ISAPI/System/IO/inputs/1/status

See this thread here: https://community.openhab.org/t/ipcamera-binding-doing-erroneous-isapi-request-to-hikvision-camera/147345/3

t2000 commented 1 year ago

This thread is about wrong I/Os reported by the camera.

So what would be the correct solution to handle such a faulty report of capabilities?

Should we introduce an advanced thing configuration that can be manually set, indicating how many I/Os the camera has? So one can decide to override whatever is reported by the camera with the configuration. Would that make sense and be the "correct" solution, i.e. instead of restarting all the time?

t2000 commented 11 months ago

@Skinah Do you have any opinion on this?

Skinah commented 11 months ago

Since you can compile the binding with changes in it, try stopping the binding from sending that API call /ISAPI/System/IO/inputs/1/status

If your 401 errors go away and the problem is solved then we should leave the code alone and fix the issue of the binding asking for IO status when the camera has no IO. That thread has the API call to ask the camera if it has IO and examples of what a few cameras will give in response.

This is causing issues for other people, so I suspect this may be the root cause and explains why I do not see the issue with my cameras that do have IO.

t2000 commented 11 months ago

Thanks for your feedback, I just did:

Since you can compile the binding with changes in it, try stopping the binding from sending that API call /ISAPI/System/IO/inputs/1/status

And now I will watch my logs and report if the problem goes away. If it does, I can also provide a draft for an PR with the described solution you mentioned.

t2000 commented 11 months ago

@Skinah After commenting out the request, the camera did not go offline with the 401 again.

I have adjusted https://github.com/openhab/openhab-addons/pull/15613

in a way that it now contains an advanced flag for the hikvision thing named hasExternalAlarmInputs which defaults to false since I think most people won't use them I guess. And if they do, they can simply enable them with that setting. Please have a look as it solves the problem also for others.

t2000 commented 10 months ago

back to the beginning...

My camera supports I/O alarms and the binding can also request the status:

16:24:00.899 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:00.922 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:08.901 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:08.919 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:16.901 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:16.916 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:24.903 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:24.920 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:32.905 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:32.919 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200
16:24:40.903 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=401
16:24:40.914 [INFO ] [ipcamera.internal.MyNettyAuthHandler] - Response to url=/ISAPI/System/IO/inputs/1/status with type=GET is=200

The first 401 is handled by your processAuth here: https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.ipcamera/src/main/java/org/openhab/binding/ipcamera/internal/MyNettyAuthHandler.java#L161

However, sometimes the camera does not send the WWW-Authenticate as I mentioned when I opened the ticket.

So requesting whether or not the camera supports I/Os doesn't help with the issue.

IMHO the best solution is to change the configerror to cameraCommunicationError here https://github.com/openhab/openhab-addons/blob/main/bundles/org.openhab.binding.ipcamera/src/main/java/org/openhab/binding/ipcamera/internal/MyNettyAuthHandler.java#L163

because its a bug in the firmware of these cameras and a restart always helps in such cases. Kai also suggested such a workaround as thats the only thing we can do with flaky hardware unfortunately: https://github.com/openhab/openhab-addons/pull/15613#pullrequestreview-1661192800

I have reverted my commit to the retry. Feel free to take the solution or close my MR. I will live with that solution for now.

Skinah commented 10 months ago

Thank you for helping me understand. I would support either the auto retry as your suggesting, or a config added for only hikvision thing types as you did before, except making it on as default and you can disable it for just your camera. This way the default does not cause a breaking change and anyone with issues can disable it. Once again sorry if this is painful, but this is why we encourage people to discuss first. I am very busy so sorry if I did not re read all info and I will try to make the time ASAP to try and get this merged in a timely way. Thank you for your work doing this.

t2000 commented 10 months ago

Thanks for your reply. No worries, I understand that we are all doing this in our spare times.

I cannot go back to the "hikvision flag" as this will effectively turn off the handling of external I/O events for cameras with that flag not set. As I mentioned, my camera DOES support external I/O alarms and if I am going to decide to use it at some point in the future, I would like to see that work.

The only "problem" I see with my change is that if you put wrong credentials into your thing, but even then you will see in the event logs that your thing goes ONLINE/OFFLINE/ONLINE etc. in a loop and you still get the message that your credentials might be wrong...

Long story short: Every few days/weeks the camera decides to respond ONCE with a 401, like a rare hicup and afterwards all is good again. Hence the retry for such cases for all cameras.

Skinah commented 10 months ago

Thanks for the PR, have approved it to be merged. I do see a minor issue and that is that hikvision will lock the camera when the wrong password is tried multiple times, this is called the 'illegal login lockout'. So people that do use the wrong password will need to turn the camera off and back on again to reset this feature. From memory this may even get tripped by the binding before this change was made as the binding may be making multiple requests in parallel without waiting for the first reply to be made. Just making this note in case someone searches the issues as they are getting the wrong password error and find this info. I will be looking at making the change to remove checking the IO if the camera does not have the feature and have started working on this already whilst it is fresh in my head. Thanks once again for your work on this.