yizhilll / MERT

Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".
Apache License 2.0
301 stars 18 forks source link

Reproducing the numbers #13

Open Laubeee opened 10 months ago

Laubeee commented 10 months ago

Hi, I am interested in reproducing the numbers you reported on NSynth. With the models from HuggingFace I do get close, but not quite to what you report (0.4 - 0.8 lower for the models I tried, which are 330M, 95M-public and data2vec). May I ask, did you use the settings in the MARBLE-Benchmark repository to achieve these numbers? (i.e. train one hidden layer of 512 units and 128 outputs, for max 50 epochs with early stopping and LR reduction, batch size 64, 5 runs with different LR)

annabeth97c commented 3 months ago

Hello! I have also been struggling to replicate the performance reported on the MARBLE Benchmark, but on the MTG dataset tasks (Mood, Genre and Instrument). I also tried to use the same setup as the MARBlE repository, which is very similar to the one described by @Laubeee