thswodnjs3 / CSTA

The official code of "CSTA: CNN-based Spatiotemporal Attention for Video Summarization"
MIT License
36 stars 3 forks source link

How to generate image features #1

Open wenxie18 opened 1 month ago

wenxie18 commented 1 month ago

Hi, I want to apply your pretrained model to my own videos. Could you please let me know how to convert the video to proper features for the importance score prediction?

I found your dataset contains the feature directly, but I didn't find which scripts work on this.

Thanks.

thswodnjs3 commented 1 month ago

To obtain frame features, please use the pre-trained GoogleNet for videos at 2 fps. You can see the feature extraction process in my repository. ( https://github.com/thswodnjs3/CSTA/blob/4a584a94861061f4e0cd1997fef4376ac62ee944/video_helper.py#L39 )

wenxie18 commented 3 weeks ago

Thank you.

Is the process the same if I want to fine-tune your model on my own videos? First, I need to extract and prepare dataset using the video_helper.py. Then, I train the mode using the train.py.

Appreciate it if you could let me know more details. Thanks.

thswodnjs3 commented 3 weeks ago

Yes, the process is the same when fine-tuning my model on your videos, except for preparing the h5py file.

Since my code is designed to use an h5py file, you'll need to create one manually (step 2 below).

  1. Extract frame features from 2 fps videos using the pre-trained GoogleNet. (Refer to video_helper.py)
  2. Manually create an h5py dataset containing the frame features, target scores, etc. (You can reference the h5py file structure here: https://github.com/e-apostolidis/PGL-SUM?tab=readme-ov-file#data)
  3. Use the extracted features to fine-tune my models. (Refer to train.py)