sukun1045 / video-physics-sound-diffusion

Apache License 2.0
44 stars 3 forks source link

physics-driven diffusion models for impact sound synthesis from videos (CVPR 2023)

Project Page

Paper Link

Code

Code is available now under the folder video-physics-sound-diffusion!

Pre-processed data and Pre-trained Weights Links

For questions or help, please open an issue or contact suk4 at uw dot edu

Requirement

Prepare Data

Training and Inference for Sound Physics and Residual Prediction

Training for Physics-driven video to Impact Sound Diffusion

Generating Samples

Citation

If you find this repo useful for your research, please consider citing the paper

@inproceedings{su2023physics, title={Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos}, author={Su, Kun and Qian, Kaizhi and Shlizerman, Eli and Torralba, Antonio and Gan, Chuang}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={9749--9759}, year={2023} }

Acknowledgements

Part of the code is borrowed from the following repo and we would like to thank the authors for their contribution.

We would like to thank the authors of the Greatest Hits dataset for making this dataset possible. We would like to thank Vinayak Agarwal for his suggestions on physics mode parameters estimation from raw audio. We would like to thank the authors of DiffImpact for inspiring us to use the physics-based sound synthesis method to design physics priors as a conditional signal to guide the deep generative model synthesizes impact sounds from videos.