zehuiwu / MMML

Multi-Modality Multi-Loss Fusion Network
67 stars 9 forks source link

Unimodal Performance #7

Open kkontras opened 2 months ago

kkontras commented 2 months ago

Hi,

Thank you for your contribution and the clear code in this repo. I wanted to ask regarding the unimodal performance. I am training solely text encoder on each own for MOSEI and I get already kind of higher accuracy than the multimodal (val 89.6 and test 88.6). I can see in the paper that those numbers are significantly lower. Am I missing something?

For the record, I trained the Rob_d2v_cme_context keeping for the unimodals A_output and T_output in each case. If needed I can share exactly the model.

kkontras commented 1 month ago

A small update here, I used a learning rate of 5e-6 a cosine annealing scheduler, a batch size 8 and kept the best model based on validation accuracy. These are the only differences I spotted. While I get similar results with the CME model with the default training settings (test ~88.6)