zehuiwu / MMML

Multi-Modality Multi-Loss Fusion Network
43 stars 7 forks source link

Accuracy and ACC2 for audio_only inputs are critically wrong #5

Open 3234677361bhw opened 1 week ago

3234677361bhw commented 1 week ago

When I trained the model on the MOSI dataset with only audio features, I found that the results were seriously wrong, the result ACC2 given in your paper was 0.7099, while the result ACC2 I ran was 0.4475, and my accuracy was only -0.0009, which is obviously wrong, and I didn't adjust any model parameters, would like to ask where your problem is?Thank you for taking the time to answer my questions!

zehuiwu commented 1 week ago

HI! Did you use our code to extract audio? And did you use 16000 as the audio sample rate?

zehuiwu commented 1 week ago

To get the 70% accuracy, you also need to set the parameter --feature raw. The 70% percent accuracy reported in the paper uses the pre-trained audio models, not mel spectralgram or openSMILE features.

3234677361bhw commented 1 week ago

Yes, I used your code to extract the audio, and the audio sample rate is 16000 and the parameter feature is raw, I make sure they are already set

zehuiwu commented 1 week ago

Hi, I am able to reproduce the 70%+ accuracy consistently using either rtx4090 or A6000. You should get the same results by using a slower learning rate of 1e-5 and different seeds, since 1e-4 is too high and unstable. image