sdroh1027 / DiffusionVID

Official Repository of the paper "DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object Detection"
Apache License 2.0
41 stars 1 forks source link

Real-Time Inference for DiffusionDet model #8

Open Biberomeister opened 1 month ago

Biberomeister commented 1 month ago

Hello,

Thank you for your excellent work on this project. I am interested in implementing real-time inference using your model, but I couldn't find the source code for it in the repository. Could you please provide the real-time inference source code or guide me on how to achieve this?

Thank you for your assistance!

sdroh1027 commented 1 month ago

As I know, real-time inference is inferencing with condition that the model can only refer to current or past frames.

There are several ways to do this.

The first is updating a global coreset of the object features periodically. Updating coreset can incur some overhead, so it may be better to build the coreset periodically (e.g., every 5 frames). I think deciding update period maybe performance-speed trade-off problem.

Second, as in MEGA, you can utilize the 'memory' (rather than using coreset) which has object features of recent n frames.

Finally, as in MAMBA, you can add object features of time t to memory with randomly evicting previous object features.