Unimodal Performance - Githubissues

Hi,

Thank you for your contribution and the clear code in this repo. I wanted to ask regarding the unimodal performance. I am training solely text encoder on each own for MOSEI and I get already kind of higher accuracy than the multimodal (val 89.6 and test 88.6). I can see in the paper that those numbers are significantly lower. Am I missing something?

For the record, I trained the Rob_d2v_cme_context keeping for the unimodals A_output and T_output in each case. If needed I can share exactly the model.

zehuiwu / MMML

Unimodal Performance #7