about the model implementation

wenliangdai / Multimodal-End2end-Sparse

The code repository for NAACL 2021 paper "Multimodal End-to-End Sparse Model for Emotion Recognition".

95 stars 16 forks source link

about the model implementation #4

Closed sklee2014 closed 3 years ago

sklee2014 commented 3 years ago

Hi, thank you for the great work! I have a question about the model name in the paper and your implementation. Do MME2E and MME2E_Sparse in the code correspond to FE2E and MESM respectively? Also, if that's the case, I wonder why FE2E works better than MESM (Table 3. and 4. in the paper) even with less cross-modal interaction. (as MME2E does not have cross-modal operation other than multimodal fusion in the final layer.. probably because of the FLOPs..?) Thank you!

wenliangdai commented 3 years ago

Hi, thanks for your interest in our work.

Our experiments use cross-modal interactions to reduce computation and increase interpretability (e.g. the visualization). If the focus is on improving the performance, I guess adding interactions after getting sequence of features of each modality would be more effective.