Open mario-tux opened 4 years ago
First of all, dont apologize! Its people like you who are showing interest in Viseron that keeps me motivated :)
- is object detection applied to a frame only if motion has been detected? is applied to the whole frame or just the area where detection was found?
The trigger
config option for motion detection controls this. If set to true, object detection will only run while there is motion in the picture. If set to false object detection will run at the interval (in seconds) specified in the object detection config. I can try to clarify this in the README, maybe even change trigger
to something like trigger_object_detection
. Object detection is applied to the whole frame.
- looking a the interval parameter of the motion detection, I understand that it is not applied to all the frames coming from the camera; some of them are skipped; right?
interval
is how many seconds there are between each scan. You can set this to 0.5 so it runs two times per second, 0.25 for four times per second etc.
- if the first guess is right (point 1), why there is an interval parameter in object detection; is it applied periodically or all the frame where motion is detected?
Both intervals work the same way as described above.
- still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)
That is exactly how it works im afraid... But as you are describing, the models are trained on smaller images 300x300 so as far as i understand the accuracy would not be affected. If for example i would take your frame and split it into 4 pieces, i believe the model would find it difficult to detect properly because objects would possibly be cut off. I think it could be a good idea tho to do what you suggested and only scan the area where there is motion for objects, I have seen some methods that can be used to find potential areas of interest that can then be fed to an object detector.
You can also use the model_width
and model_height
to control this by yourself, but the larger the image the more processing power it will take.
- I'm not sure to understand when the recording stops when timeout is false; how it is related to timeout in recording section?
timeout
under motion_detection
means that recordings wont stop until no motion is detected.
timeout
under recorder
means how many seconds the recorder will continue to record when no object is detected and no motion is detected (if timeout
under motion_detection
is set to true).
Now that i try to explain this it doesnt sound particularly logical. I think better naming of these config options would make it more clear.
I hope I could clear some things up!
First of all, dont apologize! Its people like you who are showing interest in Viseron that keeps me motivated :)
Kind words.
still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)
That is exactly how it works im afraid... But as you are describing, the models are trained on smaller images 300x300 so as far as i understand the accuracy would not be affected.
Oh... I didn't think it worked in this way. I'm not sure about the accuracy. If I got a frame of 2592x1944 with a small person in the background, resizing the whole frame it could become just 30-40 pixels high. I think the Coral model is trained using images 300x300 with a full size subject. I suspect the model applied to the whole resized frame could miss many small subject. Did you test it? What are the dimensions of model used by Darknet? Maybe this is not a problem with Darknet or Deepstack but it could be a limit for the Coral EdgeTPU: just a guess....
My understanding is that Frigate use motion detection to detect subareas with something moving in each new frame; such areas are cut (maybe with some padding) and the resulting mini-images are resized to 300x300 and submitted to EdgeTPU. Indeed Frigate can query the Coral EdgeTPU several times on the same frame. Frigate was designed to be used mainly with the Coral key but this technique should save computational power usage (read speed-up) on any other detection system.
interesting; does this mean my
motion_detection:
width: 640
height: 360
(resizing my 1920x1080 stream by one third) is possibly causing the object detection to become less accurate as it is trained on a 300x300 data set?
also something I just discovered: interval
under motion_detection wants an integer, but the README.md says it's a float. I tried to specify "0.2" thinking my motion detection was running once a second, not once every frame, but when I tried to set it to 0.2 I couldn't start viseron:
$ docker run --rm -v /home/user/visdata/recordings:/recordings -v /home/user/visdata/config:/config -v /etc/localtime:/etc/localtime:ro --name viseron --device /dev/dri roflcoopter/viseron-vaapi:latest
Traceback (most recent call last):
File "viseron.py", line 7, in <module>
from lib.config import ViseronConfig
File "/src/viseron/lib/config/__init__.py", line 118, in <module>
VALIDATED_CONFIG = VISERON_CONFIG_SCHEMA(raw_config)
File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 272, in __call__
return self._compiled([], data)
File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 594, in validate_dict
return base_validate(path, iteritems(data), out)
File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 432, in validate_mapping
raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: expected int for dictionary value @ data['motion_detection']['interval']
First of all, dont apologize! Its people like you who are showing interest in Viseron that keeps me motivated :)
Kind words.
still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)
That is exactly how it works im afraid... But as you are describing, the models are trained on smaller images 300x300 so as far as i understand the accuracy would not be affected.
Oh... I didn't think it worked in this way. I'm not sure about the accuracy. If I got a frame of 2592x1944 with a small person in the background, resizing the whole frame it could become just 30-40 pixels high. I think the Coral model is trained using images 300x300 with a full size subject. I suspect the model applied to the whole resized frame could miss many small subject. Did you test it? What are the dimensions of model used by Darknet? Maybe this is not a problem with Darknet or Deepstack but it could be a limit for the Coral EdgeTPU: just a guess....
My understanding is that Frigate use motion detection to detect subareas with something moving in each new frame; such areas are cut (maybe with some padding) and the resulting mini-images are resized to 300x300 and submitted to EdgeTPU. Indeed Frigate can query the Coral EdgeTPU several times on the same frame. Frigate was designed to be used mainly with the Coral key but this technique should save computational power usage (read speed-up) on any other detection system.
Yeah i feel you. I have been running it this way for ages without any problems with accuracy, but my cameras does not cover a large area so people in it are generally quite large, even when resized. I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement
interesting; does this mean my
motion_detection: width: 640 height: 360
(resizing my 1920x1080 stream by one third) is possibly causing the object detection to become less accurate as it is trained on a 300x300 data set?
Potentially yes, also i just realized its probably smart to keep the aspect ratio and "letterbox" the images when resized
also something I just discovered:
interval
under motion_detection wants an integer, but the README.md says it's a float. I tried to specify "0.2" thinking my motion detection was running once a second, not once every frame, but when I tried to set it to 0.2 I couldn't start viseron:$ docker run --rm -v /home/user/visdata/recordings:/recordings -v /home/user/visdata/config:/config -v /etc/localtime:/etc/localtime:ro --name viseron --device /dev/dri roflcoopter/viseron-vaapi:latest Traceback (most recent call last): File "viseron.py", line 7, in <module> from lib.config import ViseronConfig File "/src/viseron/lib/config/__init__.py", line 118, in <module> VALIDATED_CONFIG = VISERON_CONFIG_SCHEMA(raw_config) File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 272, in __call__ return self._compiled([], data) File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 594, in validate_dict return base_validate(path, iteritems(data), out) File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 432, in validate_mapping raise er.MultipleInvalid(errors) voluptuous.error.MultipleInvalid: expected int for dictionary value @ data['motion_detection']['interval']
Yes this is a bug which is fixed in the current beta, which is under the dev
tag in docker hub.
i just realized its probably smart to keep the aspect ratio and "letterbox" the images when resized
Yes, I think that to change the aspect ratio of the submitted idea is not a good think for the detection.
I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement
Can we expect a revision of how Viseron works?
Absolutely! I think this article from pyimagesearch could also improve the detection process.
Absolutely! I think this article from pyimagesearch could also improve the detection process.
I hope it is fast. It would be worthy and fair to look, in detail, how Frigate works.
Yeah i feel you. I have been running it this way for ages without any problems with accuracy, but my cameras does not cover a large area so people in it are generally quite large, even when resized. I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement
I actually run into this problem a fair bit. I have a car parked in view of one camera, and some shrubs off to the side, also in view of the camera. Both Viseron and my current solution (SecuritySpy) both trigger endlessly because the motion detection sees the shrubs move, and then look at the entire scene and say "ZOMG A CAR! ALERT! ALERT!" :-)
I don't want to mask out the shrubs because the angle of the camera would show a person (high view looking down at their head/shoulders) in that area if they walked by.
Please forgive me if I've asked this already - I'm not seeing it.
Does object detection work on the (possibly scaled) frames from the motion detection engine, or does it work on the full-size frames coming from the camera? e.g. if my cameras are outputting 1920x1080 and my motion_detection section has width: 960
and height: 720
, does the object detection see a 1920x1080 image or a 960x720 image?
I just tried :dev
as I'm really hoping to get motion detection happening faster than once a second, but after pulling it and trying to run it, it seems that interval is still looking for an integer:
motion_detection:
interval: 0.2
the image fails to run with voluptuous.error.MultipleInvalid: expected int for dictionary value @ data['motion_detection']['interval']
:dev
is actually quite outdated and 1.5.0 is the most recent release.
I thought i had fixed this but its only truly fixed for motion_detection
and object_detection
that lies under each camera. Sorry about that.
Until i get a fix out you can specify something like this in your config:
cameras:
- name: camera one
...
motion_detection:
interval: 0.2 # this should work
motion_detection:
interval: 0.2 # this does not right now
Yeah i feel you. I have been running it this way for ages without any problems with accuracy, but my cameras does not cover a large area so people in it are generally quite large, even when resized. I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement
I actually run into this problem a fair bit. I have a car parked in view of one camera, and some shrubs off to the side, also in view of the camera. Both Viseron and my current solution (SecuritySpy) both trigger endlessly because the motion detection sees the shrubs move, and then look at the entire scene and say "ZOMG A CAR! ALERT! ALERT!" :-)
I don't want to mask out the shrubs because the angle of the camera would show a person (high view looking down at their head/shoulders) in that area if they walked by.
Masking will only affect the motion detector, the resulting recorded video will still include the masked area
Does object detection work on the (possibly scaled) frames from the motion detection engine, or does it work on the full-size frames coming from the camera? e.g. if my cameras are outputting 1920x1080 and my motion_detection section has width: 960 and height: 720, does the object detection see a 1920x1080 image or a 960x720 image?
width
and height under motion_detection
only affects the motion detector. There is no real benefit in making this value larger afaik.
If you want the detector to look at a larger image which might improve accuracy on small objects, but will definitely reduce performance, you can set these values:
object_detection:
model_width: 1920
model_height: 1080
This will run the detector on an image with 1920x1080 resolution.
Keep in mind that the default model_width
and model_height
is automatically set from the used model. For instance, the YOLOv4 Darknet model was trained on 608x608 by images, so this is the default value for that
Actually I was using it the other way: making the images the motion detector worked on smaller. I was doing a /2, but am now going to try a /4 (480x270 for a 1920x1080 camera) since I now understand that the images used for motion detection are not also used for object detection.
If I understand correctly, the camera images that object detection uses are resized for whatever size the object detector's models were trained on? i.e. in your example above, whatever size image the camera provides is rescaled to 608x608 (for YOLOv4 Darknet)? If so, what happens when the source image from the camera is a different aspect ratio from the object model's aspect ratio? You'd mentioned that you might think about letterboxing the camera image to maintain aspect ratio, but what is done today in 1.5.0?
One final question (which is still relevant to this issue :-)) -- when motion_detection is triggering object_detection, does object_detection's interval
have any relevance? If I have the motion detector set to 3 frames and it sees motion in 3 frames, does the object detection core get the (resized) camera images of those three frames to do object detection, or does it get only one frame? Would it make sense to give the object detector some configurable number of pre-roll frames (perhaps which won't be blurry with motion) to try to improve detection?
Sorry for all the questions, this is clearly a very tricky part of the system and understanding it well is crucial in configuring things in order to get it to work well.
No sadly aspect ratio is not maintained as the performance impact is larger than the gain in confidence in my testing, however i started looking at having ffmpeg do the letterboxing which might be more efficient.
The motion detector and object detector are totally separated, so when motion triggers it starts the object detector at interval configured for the object detector. It only analyzes new frames, the old ones used in the motion detector are not analyzed.
It does make sense that the object detector could get a few frames from before the motion detection event happened, to catch something moving by fast. This would not be trivial to implement tho so this would be a long term goal
The motion detector and object detector are totally separated, so when motion triggers it starts the object detector at interval configured for the object detector.
So when the interval for the motion and object detectors are different, does that mean that the object detector gets a "batch" of images to compare?
e.g.
motion_detection:
interval: 0.2
trigger_detector: true
object_detection:
interval: 1
Since the object detector's interval is 5x the motion detector's interval, does that mean that whenever motion detection triggers, it will "save up" up to 5 original-resolution camera frames and run the object detector every second, passing those frames over?
What happens if the object detector's interval is faster than the motion detector's interval (I think this is a nonsensical configuration, but I'm trying to understand how these interact because having the object detector's interval configurable also seems nonsensical when the object detector is only run when motion is detected (when trigger_detector: true
).
No they have no correlation at all.
I am not really sure how to explain this, so i thought maybe an image would be more clear. I cobbled together this flowchart which i hope clears things up. No frames are ever bunched together and then sent to either detector.
aha, that is MUCH clearer!
tigger_motion: true
means that "scan for objects" is gated by motion detection (no motion = no checking for objects). When trigger_motion: false
then object detection runs irrespective of motion detection, which also means that the motion detector is effectively useless since recording depends solely on objects being detected.
That makes #46 even more confusing; no objects showing up in the cam03
log means that the recorder countdown shouldn't have reset.
Thank you for such a clear explanation!
Phew, glad it helped!
Well useless is not really true, if you have timeout: true
and trigger_detector: true
under motion_detection
, the motion detector will start when the recorder starts
I would suggest to improve the documentation, or at least to give some hit here, on how the motion and object detection work inside Viseron. Some open questions trying to understand the configuration parameters:
interval
parameter of the motion detection, I understand that it is not applied to all the frames coming from the camera; some of them are skipped; right?interval
parameter in object detection; is it applied periodically or all the frame where motion is detected?model_width
xmodel_height
and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)timeout
isfalse
; how it is related totimeout
in recording section?Sorry for the multiple questions but I would understand in a better way how Viseron works in order to contribute to its development.