sp-uhh / sgmse

Score-based Generative Models (Diffusion Models) for Speech Enhancement and Dereverberation
MIT License
454 stars 69 forks source link

I have reduced the model size in half, but the RTF has doubled, what happend? #20

Open Vuducbao913 opened 1 year ago

Vuducbao913 commented 1 year ago

I'm trying to reduce the model size to reduce the inference time since the RTF is pretty high right now. I am trying to reduce the model size by reducing the number of ResnetBlock, num resolutions in the default model are 7, so I have reduced it to 3. The current model is about 27M parameters Then I trained from scratch with the WSJ data, amazingly it gave better results than the pre-trained model on some of our real datasets. You can listen to the audio here to see some of the differences https://drive.google.com/drive/folders/1aU_3btAczyzLeecAQwzNFrwiZWzotgaG?usp=share_link

More importantly, however, the RTF has doubled. The current model is much smaller in size than the pre-trained model (27M compared to 65M), I took a lot of time to this problem, what is the problem here, have you ever encountered a similar case?