seungheondoh / EMOPIA_cls

MIDI, WAV domain music emotion recognition [ISMIR 2021]
70 stars 10 forks source link

tested on test_split and got very low accurarcy using pretrained weights #4

Open yen52205 opened 3 years ago

yen52205 commented 3 years ago

I tested inference_batch.py on dataset/split/test.csv and got 0.57,0.744,0.744 on AV/A/V separately. The models I used were download from https://drive.google.com/u/0/uc?id=1L_NOVKCElwcYUEAKp1-FZj_G6Hcq2g2c&export=download (which were provided in README.md).

seungheondoh commented 3 years ago

Oh, I will check again, what type of classifier?? (audio, remi, magenta)

yen52205 commented 3 years ago

Thanks. it's magenta type. The results I mentioned (0.57/0.744/0.744) were computed by 'correctly classified numbers/ test clip numbers'. Is this the same way that you compute accuracy?

seungheondoh commented 3 years ago

image

I re-check performance, but there is no performance decrease. I think that it is difference about global seed! plz check & run https://github.com/SeungHeonDoh/EMOPIA_cls/blob/main/midi_cls/train_test.py with best hparams.yaml

or just simply add global seed in your script!

from pytorch_lightning import seed_everything

if args.reproduce:
    seed_everything(42)
yen52205 commented 3 years ago

image

I re-check performance, but there is no performance decrease. I think that it is difference about global seed! plz check & run https://github.com/SeungHeonDoh/EMOPIA_cls/blob/main/midi_cls/train_test.py with best hparams.yaml

or just simply add global seed in your script!

from pytorch_lightning import seed_everything

if args.reproduce:
    seed_everything(42)

thanks!

I didn't set global seed. will the global seed setting influence the inference result? or it only influence training reimplementation?

I added global seed to both inference_batch.py and inference.py, but still got weird results. I just used the inference_batch.py with the best weight (readme.md) to inference on all the .mid clips, and used the csv produced by inference_batch.py itself to map 'dataset/split/test.csv', and computed how many clips were correctly classified. But still got the results 0.57,0.744,0.744 on AV/A/V separately.

Here is the dataset/split/test.csv and the csv produced by inference_batch.py. Could you please give me a check if there is anything I didn't notice? 1029_seed_arousal_all.csv 1029_seed_arva_all.csv 1029_seed_valence_all.csv test.csv

seungheondoh commented 3 years ago

It's very weird. Could you follow Training from scratch step? Not use inference_batch.py

preprocessing.py train_test.py

yen52205 commented 3 years ago

I used inference_batch.py because I wanted to test the best weights you provided on EMOPIA dataset. Could I use train_test.py to do the same thing ? (only testing no training)

seungheondoh commented 3 years ago

I wanna just double checking the result. It is strange that the results are different even when there are no other factors. I will check my inference code also!

seungheondoh commented 3 years ago

tain_test1030.csv inference1030.csv

with best_weight, I found that the train_test.py and inference.py results were different. I think batch inference and zero padding seems to have affected the performance. There are only 87 test samples, small differences affected the big results.

There is no problem with best weight. I will modify the inference code to train_test style soon.

yen52205 commented 3 years ago

with best_weight, I found that the train_test.py and inference.py results were different. I think batch inference and zero padding seems to have affected the performance. There are only 87 test samples, small differences affected the big results.

There is no problem with best weight. I will modify the inference code to train_test method soon.

thanks a lot!! could you probably further explain the difference between two results after you modify this?

yen52205 commented 2 years ago

Hi, sorry to disturb. Did you find the problem that caused they different? Was zero padding interfered the result in inference_batch?