Inquiry on Reproduction of Paper Results and Request for Predicted Captions

zxf-icpc commented 2 months ago

Hi, I would like to express my admiration for the excellent work you have presented in your paper. After downloading the repository and attempting to reproduce the results from 'both/large/pytorch_model.bin', I noticed that my outcomes are slightly lower than those reported in Table.1 of the paper. I am curious if this discrepancy could be attributed to the sampling of BART.

To better understand and verify my reproduction process, would it be possible for you to share the predicted captions that were obtained in your study? Your assistance in this matter would be greatly appreciated.

Thank you for your time and consideration.

jaeyeonkim99 commented 2 months ago

Hello! Thank you for trying.

Both models are trained on both datasets, which is different from the setup in the paper (score lower on both datasets compared to models trained only on each dataset, but perform better for the demo).

You should try audiocaps/large to get the result in Table 1. Additionally, if you get evaluation results directly from raw waveforms, the scores will be slightly different (higher for large case) from the reported ones. You can get the exactly same score when you infer using the preprocessed data (except clotho/base which we lost the checkpoint and reproduced our own). This might be due to the different audio resampling processes for inference from raw waveform, as we modified it for the gradio demo.

I will also give you the predicted captions for AudioCaps large model in a few days (I am now midterm).

zxf-icpc commented 2 months ago

I will try 'audiocaps/large'. Thank you very much!

jaeyeonkim99 commented 2 months ago

@zxf-icpc Here is the prediction for our original result.

audiocaps_large_predictions.csv

zxf-icpc commented 2 months ago

Thank you very much for your help and response! The information you provided is very helpful to me.

jaeyeonkim99 / EnCLAP

Inquiry on Reproduction of Paper Results and Request for Predicted Captions #8