Is the given result in the README of track1_AVSD based on the oracle vad?

mispchallenge / misp2022_baseline

26 stars 3 forks source link

Is the given result in the README of track1_AVSD based on the oracle vad? #13

Open Achronferry opened 1 year ago

mispchallenge commented 1 year ago

Yes, we use the oracle VAD.

Achronferry commented 1 year ago

I successfully run the the baseline system and for VSD task, the pretrained model could get the similar result, but for AVSD, the DER is higher than the given result (about 16.19%). Do you have any idea about this?

Achronferry commented 1 year ago

And it seems that the run.sh misses the last training step (unfreezing all parameters)?

wzhuangx commented 1 year ago

I also encountered this problem, the result of my own run is much worse than the result of baseline. How can I solve it? Thanks.

slwu0209 commented 1 year ago

Hello, for AVSD, is the bad result obtained by using the pre-training model or your own training model?

liushenme commented 1 year ago

Me too. I got 100% DER using our own training model and I can not find the error in my code.

slwu0209 commented 1 year ago

Hello, I want to know the result of decoding with the pre-training model we provided. So I can know whether the problem is training or decoding. I tried to retrain and decode, and the result was similar to the baseline.

liushenme commented 1 year ago

Hi, I used the the pre-training model you provided and got the 25% DER. For our own training VSD model, I got 54% DER. For our own training AVSD model, I got 100% DER. So I think the problem is in training. @slwu0209

slwu0209 commented 1 year ago

But 25% DER is also incorrect, and even if the model training is poor, it should not be 100% DER. You can check the log file to see the loss and ACC of the training model. So we can determine whether the problem is in training. @liushenme

liushenme commented 1 year ago

The log of training AVSD model is shown in the figure. Is it correct? @slwu0209

slwu0209 commented 1 year ago

avsd This is the log of my own training AVSD model. Normally, you only need to train these epochs, not so many in your picture. @liushenme

Achronferry commented 1 year ago

Both pretrained model and my own training model got the similar result(DER=17%). But when I change the frame_size in decode_AVSD.py from 800 to 200, the result becomes better. I hope this could help.

liushenme commented 1 year ago

Hi @Achronferry , I followed your approach to change max_utt_durance from 800 to 100 and change frame_shift from 600 to 100. But the results became worse than before. May I ask whether the parameters I changed are consistent with yours?

Achronferry commented 1 year ago

@liushenme The max_utt_durance should be 200 instead of 100 (frame_shift is 150). In fact, I just made it consistent with the settings when training the VSD model. If it still get worse, maybe we meet different problems.

kaen2891 commented 1 year ago

Hi @Achronferry @liushenme and @slwu0209. Although this challenge has been finished, I want to research this task. At that time, I failed to participate in this challenge because of failed to download this dataset. But still hard to download this dataset due to the slow download speed. Could you share the MISP2022 dataset?