Open Achronferry opened 1 year ago
I successfully run the the baseline system and for VSD task, the pretrained model could get the similar result, but for AVSD, the DER is higher than the given result (about 16.19%). Do you have any idea about this?
And it seems that the run.sh misses the last training step (unfreezing all parameters)?
I also encountered this problem, the result of my own run is much worse than the result of baseline. How can I solve it? Thanks.
Hello, for AVSD, is the bad result obtained by using the pre-training model or your own training model?
Me too. I got 100% DER using our own training model and I can not find the error in my code.
Hello, I want to know the result of decoding with the pre-training model we provided. So I can know whether the problem is training or decoding. I tried to retrain and decode, and the result was similar to the baseline.
Hi, I used the the pre-training model you provided and got the 25% DER. For our own training VSD model, I got 54% DER. For our own training AVSD model, I got 100% DER. So I think the problem is in training. @slwu0209
But 25% DER is also incorrect, and even if the model training is poor, it should not be 100% DER. You can check the log file to see the loss and ACC of the training model. So we can determine whether the problem is in training. @liushenme
The log of training AVSD model is shown in the figure. Is it correct? @slwu0209
This is the log of my own training AVSD model. Normally, you only need to train these epochs, not so many in your picture. @liushenme
Both pretrained model and my own training model got the similar result(DER=17%). But when I change the frame_size in decode_AVSD.py from 800 to 200, the result becomes better. I hope this could help.
Hi @Achronferry , I followed your approach to change max_utt_durance from 800 to 100 and change frame_shift from 600 to 100. But the results became worse than before. May I ask whether the parameters I changed are consistent with yours?
@liushenme The max_utt_durance should be 200 instead of 100 (frame_shift is 150). In fact, I just made it consistent with the settings when training the VSD model. If it still get worse, maybe we meet different problems.
Hi @Achronferry @liushenme and @slwu0209. Although this challenge has been finished, I want to research this task. At that time, I failed to participate in this challenge because of failed to download this dataset. But still hard to download this dataset due to the slow download speed. Could you share the MISP2022 dataset?
Yes, we use the oracle VAD.