sp-uhh / storm

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
MIT License
172 stars 23 forks source link

Questions on training details and enhancement process #19

Open TianyuCao opened 3 months ago

TianyuCao commented 3 months ago

Hi,

Thanks for your great work. I have several questions and hope you can clarify it.

For the Storm model in paper, what is the batch size (I saw 8 by default in the codes)? How many epochs did it train (I also saw the earlystopping setting in codes, but I wonder whether it was trained until the max of 1000 epochs, or stopped by earlystopping after 50 patience)?

Besides, I saw "For training, sequences of 256 STFT frames (≈2s) are randomly extracted from the full-length utterances". In this case, when it comes to enhancement, does it segment the whole input into several frames (2s), enhance each frame, and finally concatenate them as the output? Or enhance the whole utterance at the same time?

Also, I just generated the data based on your codes and use the WSJ0+Chime3 checkpoint to denoise the data. However, the pretrained checkpoint has lower results than article results. I wonder whether the default parameters in the codes are exactly the same as what was used to obtain the results in the paper for both data generation (create_data.py) as well as model training.

Sorry for so many questions. Thanks for your clarifications in advance.

MichaelChen147 commented 3 months ago

Good question! I also want to know it. I am a second-year graduate student. I was also researching speech enhancement based on diffusion models. Can we have a talk? 你好