v-iashin / SpecVQGAN

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
https://v-iashin.github.io/SpecVQGAN
MIT License
347 stars 40 forks source link

cpu inference colab #3

Closed AK391 closed 3 years ago

AK391 commented 3 years ago

is it possible to do inference on cpu in colab for the demo?

v-iashin commented 3 years ago

Yes, thanks for pointing it out. There was a small bug in loading the model. It did not respect the device indicator assuming it is GPU by default.

v-iashin commented 3 years ago

On my local machine, it took 2 minutes to generate a sample (12 times slower than on a GPU). Whereas, on Colab it took an incredible 16:20.

AK391 commented 3 years ago

On my local machine, it took 2 minutes to generate a sample (12 times slower than on a GPU). Whereas, on Colab it took an incredible 16:20.

for me it also took 15 minutes in colab was the colab updated?

v-iashin commented 3 years ago

I made some adjustments in Colab: now it checks if the video has audio or not. It provides a more smooth experience if a video is silent. Please see the commit log on the ./generation_demo.ipynb. It should not affect the speed or the results.