Yolo3 conditioning on prior video frame

pjreddie / darknet

Convolutional Neural Networks

Other

25.88k stars 21.33k forks source link

Has any work been done to condition the current frames on the predictions of prior frames? My understanding is that of right now, each frame is its own inference with no prior. (Essentially each frame is treated as i.i.d)

Seems like greater minds than mine might be able to create a network to condition based on the prior to help stabilize the size and 'flicker' of the bounding boxes. While natural images general scale and rotate, they generally don't appear and disappear from frame to frame. I see that some work has been done with Spatiotemporal Sampling Networks and Flow-Guided Feature Aggregation to address these issues with regards to video tracking. I'm wonder if Yolo (or a derivative) has been extending in such a way?

pjreddie / darknet

Yolo3 conditioning on prior video frame #749