Performance on RTX-3090

Hi Adeel, thanks for your interest in our work! We don't have measurements on a RTX-3090, but we list RTFs on a 2080 Ti in Table II here: https://arxiv.org/abs/2208.05830. We tested this on fairly short audio files, so your mileage may vary for audio of 1-minute length. In particular, a difference in RTF for longer sequences may be due to the attention layers which scale quadratically in runtime. You might want to have a look at our follow-up work StoRM: https://arxiv.org/abs/2212.11851, where we found that a simplified DNN architecture can perform similarly. This simiplified architecture has most costly attention layers removed which should hopefully help runtime performance for longer audio files as well. In the StoRM paper you'll also find additional results for RTFs as well as an overall reduction of model runtime due to the new ideas presented there.

sp-uhh / sgmse

Performance on RTX-3090 #22