sr99622 / libonvif

Onvif library with GUI implementation and built in YOLOX
GNU Lesser General Public License v2.1
166 stars 42 forks source link

Discovery not always working #66

Open leechpool-github opened 1 year ago

leechpool-github commented 1 year ago

Hi, I am using this app on Ubuntu 22.04. Since the Wayland issue was fixed, I've been using this app daily to view / PTZ my cameras. I've found that discovery mainly just works and finds all my cameras, but occasionally, it will find none or a subset of the nine I currently have. Even ones on my wired network don't always get discovered 100% of the time. I've always immediately checked I can ping and access by web interface gui, so I know the cameras are there and reachable (I use fixed IP for all cameras). I've noticed that when there are a number of cameras missing, often pressing "discover" will find them, but often enough to be an issue, repeated attempts to "discover" will fail. I've tried leaving the app open and just waiting.... then after a random period of time the cameras get found again.

Any ideas? is this a characteristic of the onvif discovery process? or perhaps something wrong with my Network?

I've seen requests to be able to manually enter camera details from others and responses indicating that this is not the intended use case etc. Accepting this, I wondered if the option to save and import the discovered cameras might be considered as a way around the discovery issues I am getting. I don't necessarily expect the app to be modified to help with my particular situation and would be prepared to attempt to modify the code of my install to "help myself", but I am new to Python so any pointers would be appreciated.

There don't seem to be many Linux apps capable of viewing and moving onvif cameras, which a relatively unskilled Linux enthusiast like me can just pick up and use. I really appreciate the effort that went into creating and sharing this app!

:)

sr99622 commented 1 year ago

Hi,

Thank you for your message, the feedback is greatly appreciated.

There are a number of factors that go into discovery that can affect the reliability of the process. In general terms, discovery is facilitated by a transmission from the host computer of a UDP packet which is sent out on the subnet with a reserved address and port that signify a broadcast packet and a response destination. All devices on the subnet are able to receive the packet, and those that are so programmed may respond back to the host. UDP packets are not guaranteed to be received over the network. This is in contrast to TCP packets which do have guaranteed delivery.

A network that is very busy will lose a larger percentage of UDP packets than a network with little traffic. Additionally, individual devices may not be well equipped to handle UDP packets as is often the case with IP cameras. Many cameras will not implement the ONVIF standard properly or have a proper network implementation or may be underpowered in terms of processing ability. Many times, the web interface GUI is using a proprietary communication protocol from the camera manufacturer that will have better performance than the camera ONVIF implementation.

Some ONVIF applications will attempt to overcome this issue by continuously transmitting broadcast packets during the entire time that the application is open. My own personal opinion was that this is not a good strategy, mostly because it will be sending spurious broadcasts, which might further pollute an already saturated network, but I'm sure you could find counter arguments to that position.

Another strategy would be to isolate the cameras on a separate subnet. This would require the host to be equipped with multiple ethernet interfaces. If the host has a wired as well as a wireless interface, the cameras could be attached to a router connected to the host computer wired interface and the host could communicate to the main router using the wireless interface. The onvif-gui program is able to accommodate this configuration.

It may also be possible to cache the camera configuration values so that the discovery process is not needed to find them. There are some drawbacks to this approach, the biggest being that cameras configured with DHCP settings will change addresses and not be found. Some cameras depend on clock synchronization for authentication and these will drop out eventually from clock drift or daylight savings time changes. One hack could be to just leave the program opened when not using it and then the settings will be saved for as long as the program is opened, the camera streaming could be turned off while not using it.

In order to cache the camera configurations, it would require quite a bit of work as the data would have to be serialized to disk and then resurrected with some sort of option to compare it to data that might be found on discovery. Additionally, the GUI would have to be synchronized between the possibly outdated data saved on disk versus data that could found at discovery, so there would be a significant amount of development and testing to properly implement such a feature. If there had been a significant amount of interest in the program, I might consider doing something like that, but the interest is just not there.

You could try an open source program https://www.ispyconnect.com/, maybe have better luck with that.

Best Regards,

Stephen

leechpool-github commented 1 year ago

Hi Stephen, Thanks for taking the time to reply and explaining the issues. I had been looking at the code for the gui and library, trying to figure out from my little python knowledge how it worked and what might be involved in attempting some of the things I describe above. I'd come to the conclusion that it was beyond me; hence me opening this question, and I think I suspected the answer would be that it would be such a change in the program's intended function that it was unlikely to happen. Thanks for suggesting ispyconnect or AgentDVR as they seem to have renamed it. I do use it and it works well, but by default it decodes all cameras because I guess it's intended use is recording / movement monitoring all cameras. Owing to this it is quite resource hungry and trying to switch camera decoding on and off as you want to look at them is just clunky; it's just not how the program was designed to be used. My main PC can certainly handle it, but I really just wanted something to be able to quickly view a camera and move/zoom it easily from time to time. Your program is perfect for this. It runs very well on my wife's low powered laptop etc. and the presentation is just clean and simple. My wife and I are using your program and will continue to do so, I just thought I'd ask about the discovery thing in case there was an easy way to avoid the infrequent times when cameras are missed. Thanks again for the brilliant app. Regards Roger

StuartIanNaylor commented 11 months ago

I am the same but also a onvif noob:) I have a el cheapo Ranger2 IMOU cam just for testing and the UDP browse doesn't seem to work. I am wondering if a secondary browse for ports open on 80/554 should indicate a possible? IE just steal some port scanning code for that. I am guessing some of the NVR do it that way as always discovered?

sr99622 commented 11 months ago

Hi Stuart,

Thank you for reaching out. @leechpool-github has a good point that the program should be able to find the cameras without the full discovery process. I have recently become aware that this is indeed the case, thanks to help from this issue thread.

I will be adding some code to find the cameras if they don't pop up every time based on previous connections. There will also be an input for user supplied IP address and ONVIF port. There is a new release coming soon, these will be included with several other new features, including multi-stream.

StuartIanNaylor commented 11 months ago

I have been using SmartPSS as I noob my way through Onvif, which is really good but closed source. The idea is to use el cheapo china cams and bypass inbuilt detection with more sophisticated models, so when I saw libonvif it looked perfect. If anyone has any ideas as still getting to grips with Onvif and what is avail but is there anyway to stream audio only or due you just grab the full stream and just demux the audio if that is all you are interested in? Also haven't figured it out many cams have a speaker that you can send audio to and wondering if you might enable discussions for noobs like me to ask stupid questions :)

When the new release is out please post here as then I will get a notify, many thanks.

sr99622 commented 11 months ago

Thank you for the inquiry. The video and audio streams are already demuxed for you and can be accessed through python modules. There is a file that shows how to access the audio data for processing sample.py. Each sample of the audio data is passed into the python module as a numpy array. The sample program is very simple, it just gives the option to mute either channel of a two channel stream. I haven't done much with the audio, I'm thinking just now that most cameras are single channel, so you would have to figure that in.

Looking at the code, F is the frame and is the same as ffmpeg AVFrame. You slice the numpy array to get the channels

if F.isValid():
    sample = np.array(F, copy=False)

    #print(F.m_rts)
    #print(sample.shape)
    #print(F.nb_samples())
    #print(F.sample_rate())
    #print(F.channels())
    #print(np.sum(sample))

    if F.channels() == 2:
        self.mw.audioConfigure.lblStatus.setText("Processing 2 channel audio")

        left = sample[::2]
        if self.mw.audioConfigure.chkMuteLeft.isChecked():
            left *= 0

        right = sample[1::2]
        if self.mw.audioConfigure.chkMuteRight.isChecked():
            right *= 0

    else:
        self.mw.audioConfigure.lblStatus.setText("Error: only 2 channel streams are supported")

If you are looking for a good cheap camera, I recommend the Amcrest brand. I also have had good results with trendnet. Most dahua are pretty good, as are hikvision. Axis can be a little quirky and expensive, but the quality is very good.

Please let me know if you are able to get the YOLO working on the video, I have had zero feedback on that feature

StuartIanNaylor commented 11 months ago

I will have a look at Yolo in next couple of days and report back

leechpool-github commented 11 months ago

I will be adding some code to find the cameras if they don't pop up every time based on previous connections. There will also be an input for user supplied IP address and ONVIF port.

That's fantastic news, :)

StuartIanNaylor commented 11 months ago

@sr99622 Hi Stephen I haven't looked at your version of Yolov8 but I have been playing with https://github.com/ultralytics/ultralytics on a $66 Opi5 https://www.aliexpress.com/item/1005004941808071.html If I export to tflite int8 I can get approx 33 FPS per large core imgsz=320 which using all x4 large cores should be capable of 133 FPS (it is as have tested) I have to say IMO embedding models is a bad idea as getting optimised and custom models is already available on other platforms and without your creating a very narrow usuage.

Also though as Frigate does running Yolo or any detection model on each frame likely isn't needed where low load motion detection will trigger single frame object detection. Really in any continous object detection the initial bounding box(s) clip area and max bounding box size clip are are likely the only frames of interest.

I have been looking around at various NVR and the always seem to have a viewer, that in some way someone will actually be sat 24/7 watching AI detect and tell you what you are viewing. A low computational motion detector also allows regions of interest to be forwarded to the object detector than just resizing a whole image to the object detectors input size and increasing accuracy. Usually the cam 2nd 'detect' stream is used and if the historical bounding box region of interest is larger than input then the models input size needs to be increased and similary as a matter of tuning if often smaller then likely input model size can be reduced. When you have tools like https://github.com/ultralytics/ultralytics you can do this automated and tune to the stream you are recieving.

This is one thing that has left me scratching my head as the target stream and objects have massive effects on what model should be used on set hardware. If you embed a model then likely it is quite optimal for the video stream and hardware in use by the developer, but in use by users with different hardware and streams its very likely far from optimised and for many extremely poor.

I haven't found a single NVR that tunes its AI to recieved streams which I am really puzzled at. as if you having even a basic experience of running these models then surely it must be known how important model selection to a streams paramters is if you want accuracy?!

The model and the input params can have a massive effect on accuracy and dpending on classify, detect, pose or track have very distinct FPS requirements so embedding a single model is extremely restrictive to functional use.

Yolov8n 320 image on a rk3588(s) @ 133 FPS 37.3 mAP (640 not 320) (mean Average Precision) Yolov8s 320 image on a rk3588(s) @ 53.33 FPS 44.9 mAP (640 not 320) (mean Average Precision) Yolov8m 320 image on a rk3588(s) @ 22.34 FPS 50.2 mAP (640 not 320) (mean Average Precision)

Yolov8n 640 image on a rk3588(s) @ 33.61 FPS 37.3 mAP (640) (mean Average Precision)

sr99622 commented 11 months ago

Hi Stuart, thank you for the feedback, I had not heard of Frigate before so that was good information. I like what they are doing with the lower powered devices. I'm thinking I should add access to the camera substreams like they are doing for detection. I have been mostly focused on higher powered applications with GPU so it is interesting to see things from a different perspective.

I agree with you about tuning detections and model sizes. That's why onvif-gui has a selection of models that allow the user to adjust their input sizes. The yolov8 model in onvif-gui is the ultralytics version.

StuartIanNaylor commented 11 months ago

Yeah I guess I am confused by the repo name libonvif as much seems outside that scope. I never looked at the model section as still focusing on finding a onvif lib that seems to work with all the variation that 'onvif' seems to produce.

The yolo coco metrics don't give an awful lot of info apart from a large object is >962 so likely an object detector running on the substream of the common 640 dimention if pixel based can provide info up to 96x bigger in the main stream? Not much point trying to detect objects less than 962 as the mAP trails off pretty soon.

I am not that keen on frigate as that sent me on a quest that dependent on subject and stream parameters, a pluggable network is needed than an application. I think the motion detector in frigate is relatively poor and likely an object tracker is likely a better option.

If you look at https://docs.ultralytics.com/models/yolov8/#performance-metrics then the mAP likely means your going to run with YOLOv8s to YOLOv8l and with the thought path of needing needing realtime frame based detect and classification, I don't really see the need to run on big and expensive GPU's. I only need detection once or a mean of object probility over x frames. Also unless your going to train at certain input size exporting the model to alternative sizes has a quantisation effect in terms of accuracy. mAP50 mAP50-95) 0.505 0.357 imgsz=640 0.405 0.277 imgsz=320 Which is the Yolov8s @ 320, 640 and the effect on mAP is quite severe and presume upscaling the model effects also.

I am thinking Centroid or Mosse type trackers due to speed on the substream, not only detect movement but keep track of individual objects and also provide a historical track that future position can be estimated. Its likely the tracker that will decode both streams detect on the substream and grab frames from the main and forward to an object detection model. For people its likely you could use trace and current size to capture frames spaced on speed of movement to capture your object detect dataset and send to Yolo that are needed for that objects region of interest and masking the rest. For modern person detection then the tracking might even be used in conjunction with a Blazeface style model to try to capture the best facial aspect and then forward to ID detection. None of this actually needs a GUI as otherwise why are you using AI for detection as surely the rational is we are so we don't need to be watching. Likely an SRT event timeline needs to be stored with the video, where AI is there to provide a review.

I fired from the hip without even looking at the models, but still think YoloV8 needs the bigger models if used but only might have a few frames to deal with as an object is tracked across a screen. You don't need a high end graphics card for that as a Pi5/Rk3588(s) might only be able to process YOLOv8m @ 4FPS which could be more than enough to classify an object of predicted 'best takes' as an object might be in region for several seconds but once classified is classified. The infrastructure to me seems very Kubernettes or Docker swarm style app as the RK3588 for Gflops/watt is currently outdoing Apple silicon which are current industry bests. It could be a Pi5 but is near x2 energy greedy and doesn't have the little architecture so is slower and can freeze with 100% big core load. I don't think the perspective is all that different aprt from scalabity and diversification of use through multiple streams. I still think if you embed a model or any model type like that it acts as a technology demonstrator but still question if it has any use.

That is not me dissagreeing but purely brainstorming with what I have been head scratching to achieve, which is very different in each application of incoming stream and objects of interest. Also sharing something better than Frigate but snatched the info from frigate is https://github.com/AlexxIT/go2rtc which acts as a proxy/loadbalancer to any source device and can stream to multiple clients, I don't think it decodes but merely passes on the stream, but have to look further into that.

I wish I could program in C/C++ ocassionaly I have hacked a few libs and examples together and Arm V8.2+ has mat/mul vector instructions that if you know how to optimise vectors and maybe just use the openCV libs its likely the trackers could be extremely fast maybe even run on A55 rk3566, but have not that far certainly could run on RK3588 or at least think they can with some of the example FPS being given :) Also the GPU MaliG610 is extremely efficient and like could add a few more frames, the NPU is something I am not sure about its 3core but have a feeling its less than its published ratings.

I don't even know if Yolov8 but its great introduction and with such fast advances wondering what models will turn up with BNN and frameworks such as https://github.com/larq/compute-engine check the benches in https://docs.larq.dev/zoo/ that Nvidia also supports for I guess blistering speed. https://arxiv.org/pdf/2011.09398.pdf

sr99622 commented 11 months ago

More good links, I like what AlexxIT is doing, that is something that could be useful down the road to get camera output to external viewers. I was not aware of larq, looks like something that is new. It's hard to keep up with developments as they come so quickly. I tend to stick with established technology as integrating these things can be a challenge and each of them has their own quirks and gotchas.

YoloV8 has some nice properties and is well supported and reasonably easy to implement. This makes it a good candidate for integration.

Regarding trackers, I spent a lot of time working with different ones, and I'm coming to the opinion that tracking as a whole is not ready for prime time. The best tracker IMO is ByteTrack and that really only works well with YOLOX. Even ByteTrack has some big issues and is not well supported. Most of the others don't work well at all, including DeepSORT.

I still like a good GUI, it helps to visualize what is going on, and can help simplify the setup even if the ultimate goal is complete automation. When working with a large number of parameters, the GUI can help visualize the effects of changes to the parameters on the quality of the detection. I find that the mAP quotes don't really reflect what happens in the real world, and I have to run the models on different scenarios to get a feel for how well they operate.

ahmogit commented 8 months ago

Recently acquired a Vikylin VK383-LIUF (see attached spec sheet) which is evidently a re-branded Hikvision device that Vikylin OEMs. I expected to be able to inspect/configure it via onvif-util, based on the following statement from onvif-util.1 (V1.4.6):

"Cameras made by Hikvision will have the greatest level of compatibility with onvif-util"

but alas, the camera is not even discovered via the -aoption. (Various other ONVIF-supporting cameras on the network are discovered via -a without issue.)

The discovery failure is not due to network traffic load, nor is it a basic IP connectivity issue: The camera's http server (i.e. configuration web page) is accessible, and I've probed the camera using the -aoption on a completely quiescent network on which that camera was the only active device. onvif-util -a just doesn't seem to be able to discover it at all.

Any ideas on why this is and/or how it might be addressed? Happy to provide whatever add'l info might be of interest.

Also attached is the spec sheet for the device. On p. 3 ("API") it lists three ONVIF profiles (G, S, T) to which it is presumably conformant. However, it's worth noting that Vikylin is not an ONVIF member, and hence technically not 'permitted' to claim ONVIF conformance per se. (Verified the above by contacting onvif.org directly and querying them as to whether re-branders of conforming OEM products must themselves be ONVIF members in order to claim conformance for the re-branded device. The answer was an unambiguous "yes".) Not that this has anything to do with the technical issue at hand, just mentioning it for completeness.

LATER NOTE: Problem solved, my error: Multicast discovery was disabled. Sorry for the noise.

vk383-LIUF_spec.pdf

sr99622 commented 8 months ago

Hello,

Thank you so much for reaching out, I can appreciate the confusion around ONVIF compliance. OEMs that rebrand Hikvision or Dahua cameras most often install their own software on the device. They will usually pretty much follow the general concepts of the manufacturer's software with some tweaks, which may or may not improve the device performance. Discovery used to be a big sticking point with a lot of cameras using non-standard discovery packets, although most nowadays stick to the standard. The problem with that was that there was a gray area where discovery packets would work on some cameras and not others, which was very confusing.

At one time, I made an effort to collect different types of discovery messages, and if you go to line 3018 in onvif.c, you can see where there is an option to select an alternate discovery message, either 1 or 2. The code was fixed on 1 since most cameras started complying with the standard, but you could try re-compiling with option 2 to see if that works any better. The downside is that a lot of cameras will only respond to message type 1.

It looks like from your LATER NOTE that things have started working, I hope this is the case.

Best Regards,

Stephen