martinwholtmon / IT3920-2024-Master-MSIT

Master project for MSIT 2024 - Towards Efficient Human Action Recognition: The Role of Keyframe Selection in Video Processing
MIT License
0 stars 0 forks source link

Model: CNN + RNN #86

Closed martinwholtmon closed 2 weeks ago

martinwholtmon commented 2 months ago

Implement the last model using a pre-trained CNN to extract features (spatial information), and then use an RNN to capture temporal information.

CNN: BNInception or inception v3 (Pre-trained on K100 RGB): https://yjxiong.me/others/kinetics_action/

RNN: LSTM probably, look into it.