scanner-research / rekall

Compositional Video Event Specification
http://www.danfu.org/projects/rekall-tech-report/
Apache License 2.0
69 stars 15 forks source link

One-to-one mapping operator? #23

Open abhaygargab opened 4 years ago

abhaygargab commented 4 years ago

Hello,

I want to track pedestrians using Rekall. So i am taking detections per-frame and comparing the iou of detections in the previous frame. I am using join to merge and update the intervals which has iou>iou_thresh. But join gives one-to-many mapping. Is there a way to make one-to-one assignments/mapping in Rekall?

Thanks

DanFu09 commented 4 years ago

Based on your description, you're probably looking for the coalesce operator. Coalesce recursively merges intervals over time within a single video. You can set it to only merge intervals that have some spatial overlap property.

Check out step three of the parking space detection tutorial. It takes parking space bounding boxes from individual frames, and merges them over time based on IOU overlap.

abhaygargab commented 4 years ago

Can you please be more elaborate on how Coalesce can be used for tracking? I want to have different IDs for different pedestrians and track them. Right now i am looping over all the frames and comparing detections of one frame with the detections of previous frame. But as far as i understand coalesce merges and combines intervals, because of which i will lose the bounds information of each pedestrian. I wanted a one-to-one mapping operator like maybe Linear_Assignment which would help me propagate the IDs of pedestrians (by updating only payload) keeping the bounds information undisturbed. It would be great if Coalesce can solve this.

DanFu09 commented 4 years ago

Coalesce merges intervals by taking a linear sweep throughout the time dimension and merging in new payloads that match some predicate (by default, the predicate is just time overlap). You can specify the merge condition, and what happens upon merge, so you can do something like this:

from rekall.predicates import *

# load IntervalSetMapping of pedestrians, one Interval per pedestrian per frame
pedestrians = ... 

# put the interval into the payload
pedestrians_nested = pedestrians.map(
    lambda interval: Interval(
        interval['bounds'],
        payload = [interval]                # whatever else you need in the payload
    )
)

# use coalesce to construct a track
pedestrians_tracked = pedestrians_nested.coalesce(
    ('t1', 't2'),
    bounds_merge_op = lambda b1, b2: Bounds3D(
        t1 = b1['t1'], t2 = b2['t2'],
        x1 = b2['x1'], x2 = b2['x2'], 
        y1 = b2['y1'], y2 = y2['x2'], 
    ), # take the span in time, but take the later Interval's spatial extent
    predicate = lambda i1, i2: and_pred(
        after(max_dist = 1 frame),          # only merge pedestrians from the next frame
        iou_at_least(0.8)                   # only include in the track if the IOU > 0.8
    ),
    payload_merge_op = 
        lambda p1, p2: p1 + p2              # merges the lists of pedestrians
)

Each Interval in pedestrians_tracked represents all the pedestrians according to your tracking conditions (in this case, the pedestrians need to be detected in contiguous frames, and have bounding box IOU >= 0.8 between frames). The payload of each Interval in pedestrians_tracked has all the original Intervals in that track.

You could use a map function to assign ID's to each track, and the split function to get them back into a top-level object.

Hopefully this was helpful!

DanFu09 commented 4 years ago

One more note: if you're using vgrid to visualize things, you can use map to turn the list of Intervals in the payload to IntervalSets -- i.e., pedestrians_tracked.map(lambda i: Interval(i['bounds'], IntervalSet(i['payload']))) -- and visualize each track in vgrid with NestedFormat.