yizhilll / MERT

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".
Apache License 2.0
301 stars 18 forks source link

Music Descriptor prediction (especially for EMO task) #16

Open uu95 opened 4 months ago

uu95 commented 4 months ago

First of all, impressive work.

However, while testing with the Music Descriptor, I noticed that it can predict valence and arousal for different music samples. What I don't understand is why the predictions are always positive, even for sad and depressing songs.

According to the dataset paper, valence and arousal should be in the range of [−0.5,0.5]. Could you explain how to convert the predictions to this range? This would allow me to map them to the nearest emotion using Russell's model.

Thank you!