Feature extraction for a video dataset

zrguo / MPLMM

[ACL 2024 Main] Official PyTorch implementation of the paper "Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition"

MIT License

33 stars 1 forks source link

Feature extraction for a video dataset #1

Open gak97 opened 3 months ago

gak97 commented 3 months ago

Thank you for your work on multimodal prompt learning for missing modalities.

I have a video dataset which is not for sentiment analysis or emotion recognition but I want to use your architecture for video classification. What models should I use to extract the features from this dataset so that I can use them in your code?

Also, could you please tell me what changes I have to make to your code to make it work with the above video dataset?

zrguo commented 2 months ago

Hi, for feature extraction, you can refer to https://github.com/thuiar/MMSA-FET. After you extract the features, you need to define your Dataset class. If you use MulT as backbone, you just need to modify the prediction head and hyperparameters for your task. If you want to use other backbones, you are expected to define a prompt model based on the backbone you use.