sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

Performance on RTX-3090 #22

Closed gfh-dev closed 1 year ago

gfh-dev commented 1 year ago

Hi, Do you have any estimate of performance of the algorithm on modern cuda enabled GPU, e.g. 3090. I am seeing very slow inference perofrmance (1 minute of audio takes about 4 minutes to render). Also wondering if there is any way to speed this up?

Thanks for the great work! Adeel

cobalamin commented 1 year ago

Hi Adeel, thanks for your interest in our work! We don't have measurements on a RTX-3090, but we list RTFs on a 2080 Ti in Table II here: https://arxiv.org/abs/2208.05830. We tested this on fairly short audio files, so your mileage may vary for audio of 1-minute length. In particular, a difference in RTF for longer sequences may be due to the attention layers which scale quadratically in runtime. You might want to have a look at our follow-up work StoRM: https://arxiv.org/abs/2212.11851, where we found that a simplified DNN architecture can perform similarly. This simiplified architecture has most costly attention layers removed which should hopefully help runtime performance for longer audio files as well. In the StoRM paper you'll also find additional results for RTFs as well as an overall reduction of model runtime due to the new ideas presented there.