happyharrycn / actionformer_release

Code release for ActionFormer (ECCV 2022)
MIT License
419 stars 77 forks source link

About extracting feature. #70

Closed EdenGabriel closed 1 year ago

EdenGabriel commented 1 year ago

Hi, Thank you for your excellent work. I have a qusetion about extracting features for epic-kitchens100. In the link your provided, it corresponds to a vector of 2304 dim for about 0.533s in a video. I want to know some details about features extracted. i.e. if 2304=161212, that means T=16,H=W=12 for per frame? (AHa, this is just my personal guess. ) What if I want to get ROIPooling feature per frame, may i get some advice from you? Looking forward to your reply. Thank you.

tzzcl commented 1 year ago

For your questions, we use a typical SlowFast network on EPIC-Kitchens. Thus the input clip is 32x224x224, after the SlowFast network (with 3D global average pooling), it will become a 2304d feature, and this is the feature we used for EPIC-Kitchens. Thus we don't apply any ROIPooling features. For more details, please refer to the SlowFast paper and SlowFast code

happyharrycn commented 1 year ago

Marked as closed. Feel free to re-open this issue if further questions arise.