Low AUC for MTT - Githubissues

keunwoochoi / music-auto_tagging-keras

Music auto-tagging models and trained weights in keras/theano

MIT License

616 stars 142 forks source link

Low AUC for MTT #18

Closed Rashmeet09 closed 6 years ago

Rashmeet09 commented 7 years ago

I trained this CNN model on MTT dataset and I got the results given in the files attached.
The acc is 90.69% but AUC is quite low being 0.58. Can you view once and suggest any change for improvement?

Code.txt output_cnn.txt

as641651 commented 7 years ago

Yea, I finetuned this model on MTT with different set of labels.. even I got less auc scores. for most of the tags, it was in the range 0.65 - 0.75.

keunwoochoi commented 7 years ago

First, the difference of accuracy and AUC is normal. There are so many zeros in true Y, so only by predicting them all zero the accuracy would be above 80% or 90% on MTT.

@Rashmeet09 The code seems alright, sorry but I have no idea. Especially when you're not using the pre-trained weights, I can't think of any reason. Side note, I'd use val_loss rather than val_acc for metric. Did you randomise the training data? how are they processed?

@as641651 I assume you used 'msd' weights. How did you prepared mel-spectrogram? The weights of MusicTaggerCNN is trained with power of power-melspectrogram, i.e. melgram**4, just because of my mistake, see here). It may have affected your experiment.

as641651 commented 7 years ago

I just use the audio processor from your repo: logam(melgram(y=src, sr=12000, hop_length=256, n_fft=512, n_mels=96)**2, ref_power=1.0)

So you used melgram(y=src, sr=12000, hop_length=256, n_fft=512, n_mels=96)2)2? no log amplitute?

keunwoochoi commented 7 years ago

Oh, then it's correct, never mind. Hm...

Rashmeet09 commented 7 years ago

I processed and split the MTT (0-11 folders for train.h5; 12,13 for valid.h5 and 14,15 for test.h5) as in the file attached (referred urbansound dataset pre-processing and audio_processor from your repo) :- split.txt

How did you pre-process MTT? Should I use last.fm dataset from MSD to reproduce better results on this model?

as641651 commented 7 years ago

I was filtering out labels with prob < 0.2 while testing. I removed this and re-evaluated n I got weighted average auc 0.80 (fine-tuned on new set of labels for 40k iter) That seems reasonable right? furthermore, I had removed max-pooling in final layer of CNN