Customized dataset feature extraction in Charades-STA style

wjun0830 / CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

https://arxiv.org/abs/2311.08835

Other

109 stars 11 forks source link

Closed xijun-cs closed 4 months ago

xijun-cs commented 6 months ago

Hi,

Thanks for your awesome work!

I wanna try your method on a customized dataset, which is similar to Charades-STA.

How can I extract the Slowfast + CLIP feature and the text feature on my own?

Best

wjun0830 commented 6 months ago

Hello.

Thank you for your interest in our work. For feature extraction, we suggest you to take a look at Official github repo for CLIP and Slowfast!

Thanks

xijun-cs commented 6 months ago

Thanks for your reply.

For double-check, is the clip model ViT-B/32 and slow-fast model SLOWFAST_8x8_R50?

Thanks

ArlixLin commented 4 months ago

Thanks for your reply.

For double-check, is the clip model ViT-B/32 and slow-fast model SLOWFAST_8x8_R50?

Thanks

I want to know it too~

wjun0830 commented 4 months ago

wjun0830 commented 4 months ago

And for the clip model yes we used clip b 32 model

ArlixLin commented 4 months ago

Thanks, I found my way (ORZ): Your job ->Umt->Moment-DETR->HERO_Video_Feature_Extractor

tiesanguaixia commented 3 months ago

And for the clip model yes we used clip b 32 model

Hi, the visual and text encoder are frozen, just use the offline features, right? Thank you!

wjun0830 commented 3 months ago

And for the clip model yes we used clip b 32 model

Hi, the visual and text encoder are frozen, just use the offline features, right? Thank you!

Yes you are right