open-mmlab / FoleyCrafter

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
https://foleycrafter.github.io/
Apache License 2.0
414 stars 37 forks source link

Cannot Reproduce Results. Release Evaluation Code #15

Open wjc2830 opened 1 month ago

wjc2830 commented 1 month ago

Thank you for your work on this paper. However, I am unable to reproduce the main results reported in your paper, including the FID, onset detection accuracy, and AP. For evaluating FID, I used the SpecVQGAN code, and for onset performance, I used the CondFoleyGen code. Despite using these resources, the results obtained from this repository's inference code do not match those reported in your paper. Could you please release your evaluation scripts to facilitate further investigation and ensure reproducibility?

ymzhang0319 commented 1 month ago

Hi @wjc2830, thanks for your interest.

We use the same evaluation tools. Could you please provide more details in your evaluation settings (semantic weight 1.0, temporal weight 0.2,... in our experiments)? Then I can try to help you figure it out.

wjc2830 commented 1 month ago

Yes, ip_adapter_weight is set to 1.0 and controlnet_conditioning_scale is set to 0.2. I opted not to use a class prompt like "machine gun shooting" because I observed that the results generated without it were superior to those with it. With these settings, I got onset acc: 0.1213, detection acc: 0.1347, detection ap: 0.6893 FID: 47.498411865917966, MKL: 5.17888476451238, KID: [0.046363522010469276-1.8900277153235862e-07] on AVSync15 (1500 samples). Regarding FID computation, I wanna clarify if the reported FID score represents an average across all 15 classes or no concept of class here.

ymzhang0319 commented 1 month ago

Thanks for your information! The evaluation experiments is conducted on the AVSync15 test set (150 samples.) You can refer to the official link of AVSync15. If you have any other questions, please feel free to contact us.

wjc2830 commented 1 month ago

Thank you for the prompt reply. I have re-implemented the evaluation and obtained the following results: FID: 33.87400189673342 MKL: 5.159568889935811 KID: [0.053455384736237746-1.3165596698642787e-07] #onset acc: 0.1007, detection acc: 0.1209, detection ap: 0.6936. There are still discrepancies in metrics such as MKL and detection accuracy. To ensure consistency, I recommend releasing your evaluation script.