[DOC] Better explanation of motion/object detection pipeline

mario-tux commented 4 years ago

I would suggest to improve the documentation, or at least to give some hit here, on how the motion and object detection work inside Viseron. Some open questions trying to understand the configuration parameters:

is object detection applied to a frame only if motion has been detected? is applied to the whole frame or just the area where detection was found?
looking a the interval parameter of the motion detection, I understand that it is not applied to all the frames coming from the camera; some of them are skipped; right?
if the first guess is right (point 1), why there is an interval parameter in object detection; is it applied periodically or all the frame where motion is detected?
still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)
I'm not sure to understand when the recording stops when timeout is false; how it is related to timeout in recording section?

Sorry for the multiple questions but I would understand in a better way how Viseron works in order to contribute to its development.

roflcoopter commented 4 years ago

First of all, dont apologize! Its people like you who are showing interest in Viseron that keeps me motivated :)

is object detection applied to a frame only if motion has been detected? is applied to the whole frame or just the area where detection was found?

The trigger config option for motion detection controls this. If set to true, object detection will only run while there is motion in the picture. If set to false object detection will run at the interval (in seconds) specified in the object detection config. I can try to clarify this in the README, maybe even change trigger to something like trigger_object_detection. Object detection is applied to the whole frame.

looking a the interval parameter of the motion detection, I understand that it is not applied to all the frames coming from the camera; some of them are skipped; right?

interval is how many seconds there are between each scan. You can set this to 0.5 so it runs two times per second, 0.25 for four times per second etc.

if the first guess is right (point 1), why there is an interval parameter in object detection; is it applied periodically or all the frame where motion is detected?

Both intervals work the same way as described above.

still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)

That is exactly how it works im afraid... But as you are describing, the models are trained on smaller images 300x300 so as far as i understand the accuracy would not be affected. If for example i would take your frame and split it into 4 pieces, i believe the model would find it difficult to detect properly because objects would possibly be cut off. I think it could be a good idea tho to do what you suggested and only scan the area where there is motion for objects, I have seen some methods that can be used to find potential areas of interest that can then be fed to an object detector. You can also use the model_width and model_height to control this by yourself, but the larger the image the more processing power it will take.

I'm not sure to understand when the recording stops when timeout is false; how it is related to timeout in recording section?

timeout under motion_detection means that recordings wont stop until no motion is detected. timeout under recorder means how many seconds the recorder will continue to record when no object is detected and no motion is detected (if timeout under motion_detection is set to true). Now that i try to explain this it doesnt sound particularly logical. I think better naming of these config options would make it more clear.

I hope I could clear some things up!

mario-tux commented 4 years ago

First of all, dont apologize! Its people like you who are showing interest in Viseron that keeps me motivated :)

Kind words.

still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)

That is exactly how it works im afraid... But as you are describing, the models are trained on smaller images 300x300 so as far as i understand the accuracy would not be affected.

Oh... I didn't think it worked in this way. I'm not sure about the accuracy. If I got a frame of 2592x1944 with a small person in the background, resizing the whole frame it could become just 30-40 pixels high. I think the Coral model is trained using images 300x300 with a full size subject. I suspect the model applied to the whole resized frame could miss many small subject. Did you test it? What are the dimensions of model used by Darknet? Maybe this is not a problem with Darknet or Deepstack but it could be a limit for the Coral EdgeTPU: just a guess....

My understanding is that Frigate use motion detection to detect subareas with something moving in each new frame; such areas are cut (maybe with some padding) and the resulting mini-images are resized to 300x300 and submitted to EdgeTPU. Indeed Frigate can query the Coral EdgeTPU several times on the same frame. Frigate was designed to be used mainly with the Coral key but this technique should save computational power usage (read speed-up) on any other detection system.

akohlsmith commented 4 years ago

interesting; does this mean my

motion_detection:
    width: 640
    height: 360

(resizing my 1920x1080 stream by one third) is possibly causing the object detection to become less accurate as it is trained on a 300x300 data set?

akohlsmith commented 4 years ago

also something I just discovered: interval under motion_detection wants an integer, but the README.md says it's a float. I tried to specify "0.2" thinking my motion detection was running once a second, not once every frame, but when I tried to set it to 0.2 I couldn't start viseron:

$ docker run --rm -v /home/user/visdata/recordings:/recordings -v /home/user/visdata/config:/config -v /etc/localtime:/etc/localtime:ro --name viseron --device /dev/dri roflcoopter/viseron-vaapi:latest
Traceback (most recent call last):
  File "viseron.py", line 7, in <module>
    from lib.config import ViseronConfig
  File "/src/viseron/lib/config/__init__.py", line 118, in <module>
    VALIDATED_CONFIG = VISERON_CONFIG_SCHEMA(raw_config)
  File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 272, in __call__
    return self._compiled([], data)
  File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 594, in validate_dict
    return base_validate(path, iteritems(data), out)
  File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 432, in validate_mapping
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: expected int for dictionary value @ data['motion_detection']['interval']

roflcoopter commented 4 years ago

First of all, dont apologize! Its people like you who are showing interest in Viseron that keeps me motivated :)

Kind words.

still related to point 1: if motion is detected, is the whole frame resized to model_widthxmodel_height and passed, for example, to EdgeTPU? The Coral, I remember, is limited to 300x300. Does it means that my 2592x1944 frame is resized to a bunch of pixels hopping to detect something? I hope no... :-)

That is exactly how it works im afraid... But as you are describing, the models are trained on smaller images 300x300 so as far as i understand the accuracy would not be affected.

Oh... I didn't think it worked in this way. I'm not sure about the accuracy. If I got a frame of 2592x1944 with a small person in the background, resizing the whole frame it could become just 30-40 pixels high. I think the Coral model is trained using images 300x300 with a full size subject. I suspect the model applied to the whole resized frame could miss many small subject. Did you test it? What are the dimensions of model used by Darknet? Maybe this is not a problem with Darknet or Deepstack but it could be a limit for the Coral EdgeTPU: just a guess....

My understanding is that Frigate use motion detection to detect subareas with something moving in each new frame; such areas are cut (maybe with some padding) and the resulting mini-images are resized to 300x300 and submitted to EdgeTPU. Indeed Frigate can query the Coral EdgeTPU several times on the same frame. Frigate was designed to be used mainly with the Coral key but this technique should save computational power usage (read speed-up) on any other detection system.

Yeah i feel you. I have been running it this way for ages without any problems with accuracy, but my cameras does not cover a large area so people in it are generally quite large, even when resized. I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement

roflcoopter commented 4 years ago

interesting; does this mean my
motion_detection:
    width: 640
    height: 360
(resizing my 1920x1080 stream by one third) is possibly causing the object detection to become less accurate as it is trained on a 300x300 data set?

Potentially yes, also i just realized its probably smart to keep the aspect ratio and "letterbox" the images when resized

roflcoopter commented 4 years ago

also something I just discovered: interval under motion_detection wants an integer, but the README.md says it's a float. I tried to specify "0.2" thinking my motion detection was running once a second, not once every frame, but when I tried to set it to 0.2 I couldn't start viseron:

$ docker run --rm -v /home/user/visdata/recordings:/recordings -v /home/user/visdata/config:/config -v /etc/localtime:/etc/localtime:ro --name viseron --device /dev/dri roflcoopter/viseron-vaapi:latest
Traceback (most recent call last):
  File "viseron.py", line 7, in <module>
    from lib.config import ViseronConfig
  File "/src/viseron/lib/config/__init__.py", line 118, in <module>
    VALIDATED_CONFIG = VISERON_CONFIG_SCHEMA(raw_config)
  File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 272, in __call__
    return self._compiled([], data)
  File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 594, in validate_dict
    return base_validate(path, iteritems(data), out)
  File "/usr/local/lib/python3.6/dist-packages/voluptuous/schema_builder.py", line 432, in validate_mapping
    raise er.MultipleInvalid(errors)
voluptuous.error.MultipleInvalid: expected int for dictionary value @ data['motion_detection']['interval']

Yes this is a bug which is fixed in the current beta, which is under the dev tag in docker hub.

mario-tux commented 4 years ago

i just realized its probably smart to keep the aspect ratio and "letterbox" the images when resized

Yes, I think that to change the aspect ratio of the submitted idea is not a good think for the detection.

I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement

Can we expect a revision of how Viseron works?

roflcoopter commented 4 years ago

Absolutely! I think this article from pyimagesearch could also improve the detection process.

mario-tux commented 4 years ago

Absolutely! I think this article from pyimagesearch could also improve the detection process.

I hope it is fast. It would be worthy and fair to look, in detail, how Frigate works.

akohlsmith commented 3 years ago

Yeah i feel you. I have been running it this way for ages without any problems with accuracy, but my cameras does not cover a large area so people in it are generally quite large, even when resized. I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement

I actually run into this problem a fair bit. I have a car parked in view of one camera, and some shrubs off to the side, also in view of the camera. Both Viseron and my current solution (SecuritySpy) both trigger endlessly because the motion detection sees the shrubs move, and then look at the entire scene and say "ZOMG A CAR! ALERT! ALERT!" :-)

I don't want to mask out the shrubs because the angle of the camera would show a person (high view looking down at their head/shoulders) in that area if they walked by.

akohlsmith commented 3 years ago

Please forgive me if I've asked this already - I'm not seeing it.

Does object detection work on the (possibly scaled) frames from the motion detection engine, or does it work on the full-size frames coming from the camera? e.g. if my cameras are outputting 1920x1080 and my motion_detection section has width: 960 and height: 720, does the object detection see a 1920x1080 image or a 960x720 image?

akohlsmith commented 3 years ago

I just tried :dev as I'm really hoping to get motion detection happening faster than once a second, but after pulling it and trying to run it, it seems that interval is still looking for an integer:

motion_detection:
  interval: 0.2

the image fails to run with voluptuous.error.MultipleInvalid: expected int for dictionary value @ data['motion_detection']['interval']

roflcoopter commented 3 years ago

:dev is actually quite outdated and 1.5.0 is the most recent release.

I thought i had fixed this but its only truly fixed for motion_detection and object_detection that lies under each camera. Sorry about that.

Until i get a fix out you can specify something like this in your config:


cameras:
  - name: camera one
  ...
  motion_detection:
    interval: 0.2 # this should work

motion_detection:
  interval: 0.2 # this does not right now

roflcoopter commented 3 years ago

Yeah i feel you. I have been running it this way for ages without any problems with accuracy, but my cameras does not cover a large area so people in it are generally quite large, even when resized. I feel like doing it the way you described and only scanning areas where there is motion would be a great improvement

I actually run into this problem a fair bit. I have a car parked in view of one camera, and some shrubs off to the side, also in view of the camera. Both Viseron and my current solution (SecuritySpy) both trigger endlessly because the motion detection sees the shrubs move, and then look at the entire scene and say "ZOMG A CAR! ALERT! ALERT!" :-)

I don't want to mask out the shrubs because the angle of the camera would show a person (high view looking down at their head/shoulders) in that area if they walked by.

Masking will only affect the motion detector, the resulting recorded video will still include the masked area

Does object detection work on the (possibly scaled) frames from the motion detection engine, or does it work on the full-size frames coming from the camera? e.g. if my cameras are outputting 1920x1080 and my motion_detection section has width: 960 and height: 720, does the object detection see a 1920x1080 image or a 960x720 image?

width and height under motion_detection only affects the motion detector. There is no real benefit in making this value larger afaik.

If you want the detector to look at a larger image which might improve accuracy on small objects, but will definitely reduce performance, you can set these values:

object_detection:
  model_width: 1920
  model_height: 1080

This will run the detector on an image with 1920x1080 resolution. Keep in mind that the default model_width and model_height is automatically set from the used model. For instance, the YOLOv4 Darknet model was trained on 608x608 by images, so this is the default value for that

akohlsmith commented 3 years ago

Actually I was using it the other way: making the images the motion detector worked on smaller. I was doing a /2, but am now going to try a /4 (480x270 for a 1920x1080 camera) since I now understand that the images used for motion detection are not also used for object detection.

If I understand correctly, the camera images that object detection uses are resized for whatever size the object detector's models were trained on? i.e. in your example above, whatever size image the camera provides is rescaled to 608x608 (for YOLOv4 Darknet)? If so, what happens when the source image from the camera is a different aspect ratio from the object model's aspect ratio? You'd mentioned that you might think about letterboxing the camera image to maintain aspect ratio, but what is done today in 1.5.0?

One final question (which is still relevant to this issue :-)) -- when motion_detection is triggering object_detection, does object_detection's interval have any relevance? If I have the motion detector set to 3 frames and it sees motion in 3 frames, does the object detection core get the (resized) camera images of those three frames to do object detection, or does it get only one frame? Would it make sense to give the object detector some configurable number of pre-roll frames (perhaps which won't be blurry with motion) to try to improve detection?

Sorry for all the questions, this is clearly a very tricky part of the system and understanding it well is crucial in configuring things in order to get it to work well.

roflcoopter commented 3 years ago

No sadly aspect ratio is not maintained as the performance impact is larger than the gain in confidence in my testing, however i started looking at having ffmpeg do the letterboxing which might be more efficient.

The motion detector and object detector are totally separated, so when motion triggers it starts the object detector at interval configured for the object detector. It only analyzes new frames, the old ones used in the motion detector are not analyzed.

It does make sense that the object detector could get a few frames from before the motion detection event happened, to catch something moving by fast. This would not be trivial to implement tho so this would be a long term goal

akohlsmith commented 3 years ago

The motion detector and object detector are totally separated, so when motion triggers it starts the object detector at interval configured for the object detector.

So when the interval for the motion and object detectors are different, does that mean that the object detector gets a "batch" of images to compare?

e.g.

motion_detection:
  interval: 0.2
  trigger_detector: true

object_detection:
  interval: 1

Since the object detector's interval is 5x the motion detector's interval, does that mean that whenever motion detection triggers, it will "save up" up to 5 original-resolution camera frames and run the object detector every second, passing those frames over?

What happens if the object detector's interval is faster than the motion detector's interval (I think this is a nonsensical configuration, but I'm trying to understand how these interact because having the object detector's interval configurable also seems nonsensical when the object detector is only run when motion is detected (when trigger_detector: true).

roflcoopter commented 3 years ago

No they have no correlation at all.

I am not really sure how to explain this, so i thought maybe an image would be more clear. I cobbled together this flowchart which i hope clears things up. No frames are ever bunched together and then sent to either detector.

akohlsmith commented 3 years ago

aha, that is MUCH clearer!

tigger_motion: true means that "scan for objects" is gated by motion detection (no motion = no checking for objects). When trigger_motion: false then object detection runs irrespective of motion detection, which also means that the motion detector is effectively useless since recording depends solely on objects being detected.

That makes #46 even more confusing; no objects showing up in the cam03 log means that the recorder countdown shouldn't have reset.

Thank you for such a clear explanation!

roflcoopter commented 3 years ago

Phew, glad it helped!

Well useless is not really true, if you have timeout: true and trigger_detector: true under motion_detection, the motion detector will start when the recorder starts

roflcoopter / viseron

[DOC] Better explanation of motion/object detection pipeline #22