Music Descriptor prediction (especially for EMO task)

First of all, impressive work.

However, while testing with the Music Descriptor, I noticed that it can predict valence and arousal for different music samples. What I don't understand is why the predictions are always positive, even for sad and depressing songs.

According to the dataset paper, valence and arousal should be in the range of [−0.5,0.5]. Could you explain how to convert the predictions to this range? This would allow me to map them to the nearest emotion using Russell's model.

Thank you!

yizhilll / MERT

Music Descriptor prediction (especially for EMO task) #16