combined targets - Githubissues

The CLI argument --targetLabelId lets you define, which kind of objects you'd like to track (current default is a person). However, if the camera should track an equestrian with it's horse, two objects with different label IDs (person and horse) may be detected.

It might be better to track both objects as a combined object. Thereby, it might be easier to distinguish the target from other objects (other hoses or pedestrians), which is required to (re-) identify the target from frame to frame.

Sometimes (in some frames or depending on the view angle), only one of the objects (equestrian/person and horse) may be detected. A single-object target would be lost, but a combined target could be handled forgivingly, e.g. if only the horse is detected and can be re-identified (similar position and size in previous frame), it might be assumed that the bounding box (BB) of the equestiran of the previous frame has moved with the BB of the horse in the new frame. This estimation can be used to keep track of a combined target, even if only a single object of the target has been detected. If no object of the combined target is detected, the target would still be lost. Nevertheless, this approach might be more reliable in some cases.

Another use case could be the combination of different object detection techniques, e.g. combine regular object detection (at the moment SSD Mobilenet V2) with color tracking (see #3), which should solve #24. In a similar way, multi color tracking could be implemented (red shirt and blue trousers). This might even help to automatically identify/select the correct object as target (e.g. ignore people with white shirts). Identification might be handled unforgivingly (e.g. red shirt and blue trousers have to be detected in one frame), but re-identifcation might still be handled forgivingly (e.g. red shirt is detected, but blue trousers are not, since they are e.g. occluded by something), as long as one object of the combined targets is found (target is not lost), since it would allow the re-identification as desribed in the example before (assumption that BB of equestrian moved with BB of horse).

maiermic / robot-cameraman

combined targets #39