Real-Time Inference for DiffusionDet model

sdroh1027 / DiffusionVID

Official Repository of the paper "DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection"

Apache License 2.0

41 stars 1 forks source link

As I know, real-time inference is inferencing with condition that the model can only refer to current or past frames.

There are several ways to do this.

The first is updating a global coreset of the object features periodically. Updating coreset can incur some overhead, so it may be better to build the coreset periodically (e.g., every 5 frames). I think deciding update period maybe performance-speed trade-off problem.

Second, as in MEGA, you can utilize the 'memory' (rather than using coreset) which has object features of recent n frames.

Finally, as in MAMBA, you can add object features of time t to memory with randomly evicting previous object features.

sdroh1027 / DiffusionVID

Real-Time Inference for DiffusionDet model #8