yabufarha / ms-tcn

Other
222 stars 59 forks source link

feature modality #13

Closed cmhungsteve closed 5 years ago

cmhungsteve commented 5 years ago

From the paper, you mentioned you extracted features from RGB frames using I3D. Did you include other kinds of modalities (e.g. optical flow, MHI) in your features?

I am a little bit confused because most of the methods you compared use RGB + MHI (Motion History Image). It is really impressive if you beat them using RGB only.

yuanzhedong commented 5 years ago

After chatting with the author in this thread https://github.com/yabufarha/ms-tcn/issues/12 I think the input of the model is (T, 2048) where T is the length of the video, plus 1024 RGB features and 1024 optical flow features from I3D. I also downloaded the feature data listed in README and can verify that.

yabufarha commented 5 years ago

Yes, we used both RGB and optical flow.

cmhungsteve commented 5 years ago

got it. Thank you.