jaeyeonkim99 / EnCLAP

Official Implementation of EnCLAP (ICASSP 2024)
MIT License
85 stars 4 forks source link

Few questions about the paper #9

Open MoayedHajiAli opened 2 months ago

MoayedHajiAli commented 2 months ago

Hello,

Thank you very much for this great work. I have few questions about the paper/code. 1- Have you tried training with Wavcaps or a larger dataset? From the wavcaps paper, it seems that using more data significantly improved the results 2- Are the results reported in the paper use the checkpoint with the highest validation score? 3- From the ablations, it seems that MCM does not contribute much to the results (Cider drops by only 0.02 points), I am wondering if you have performed any ablation on the audiocaps dataset, especially with regard to the main components (MCM, and CLAP)

Thank you very much for your help.

jaeyeonkim99 commented 1 month ago
  1. The results are not included in the paper, but pretraining with WavCaps has significantly improved the results. If we pretrain the EnCLAP-large on WavCaps + AudioCaps and finetune on the target dataset, it scored SPIDER score of 0.316 on Clotho and 0.571 on AudioCaps respectively.

  2. As mentioned in README.md, we evaluated 5 best checkpoints with the highest validation score.

  3. We haven't conducted ablation on the Audiocaps dataset. Maybe I can share ablation results on AudioCaps regarding MCM (and maybe CLAP) in a few weeks (I am an undergraduate student and now I am during the semester).