Open gak97 opened 3 months ago
Hi, for feature extraction, you can refer to https://github.com/thuiar/MMSA-FET. After you extract the features, you need to define your Dataset class. If you use MulT as backbone, you just need to modify the prediction head and hyperparameters for your task. If you want to use other backbones, you are expected to define a prompt model based on the backbone you use.
Thank you for your work on multimodal prompt learning for missing modalities.
I have a video dataset which is not for sentiment analysis or emotion recognition but I want to use your architecture for video classification. What models should I use to extract the features from this dataset so that I can use them in your code?
Also, could you please tell me what changes I have to make to your code to make it work with the above video dataset?