mwray / Joint-Part-of-Speech-Embeddings

Code for the Joint Part-of-Speech Embedding model
11 stars 2 forks source link

About the video feature #2

Closed NNNNAI closed 2 years ago

NNNNAI commented 2 years ago

Hi Michael. Thanks for sharing your wonderful work~ I got a few questions for the video feature. 1: I notice that the shape of the video feature from "./data/video_features/EPIC_100retrieval{}_features_mean.pkl" is nx3072. Is such a feature obtained by concat 'RGB', 'Flow' and 'Audio' features of size nx25x1024 into nx25x3072 and then average in the time dimension? 2: What model did you use to extract the 'RGB', 'Flow' and 'Audio' features? Is it the TBN model which is trained on EPIC kitchen-100 or EPIC kitchen-55 for action Recognition?

mwray commented 2 years ago

Hi, Thanks for your interest in the code.

  1. Yes, that's exactly how the features were calculated, a concatenation followed by a mean average across the second dimension.
  2. This was a TBN model trained on the EPIC-Kitchens-55 training set.