junwenxiong / diff_sal

Offical implemention of the paper DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
18 stars 1 forks source link

about the performance on DHF1k #4

Open yunlong10 opened 1 month ago

yunlong10 commented 1 month ago

Thank you for the excellent work! But I'm having difficulty reproducing the results on DHF1k using diff-sal.

I’ve downloaded the pre-trained checkpoint on DHF1k provided in this repository, but I’m unable to achieve the scores reported in the paper. I’ve tried training from scratch using the provided configurations but still haven't succeeded.

Could you kindly offer some guidance on how to proceed?

junwenxiong commented 3 weeks ago

Sorry for late reply.

For the DHF1k dataset, it's fine to just use it for pre-training, and instead of focusing on its performance, we just train for about 20 epoches and stopped.

The training is then performed on the audio-visual dataset, and the pre-training weights for DHF1k must be remembered to be loaded, otherwise the performance will not be achieved.