yxlu-0102 / MP-SENet

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra
MIT License
267 stars 40 forks source link

question about inference #10

Open peifang-lee opened 9 months ago

peifang-lee commented 9 months ago

Thank you very much for releasing the code for your work.

I want to confirm that split=true in dataset.py means that when segment_size is defaulted to 32000, will it randomly select two seconds from the audio file for training?

I set split=true when implement both train and validation, so it can be executed successfully. However,when I implement inference.py, I find that the GPU memory of my computer is not enough for inference.

Is there any way I can resolve this problem?

yxlu-0102 commented 9 months ago

Yes, our code randomly selects 2-second audio segments for training purposes, while during validation, it retains the complete audio.

If you also slice the audio during validation, it doesn't affect the training process. However, the metrics you observe on the validation set at this point may not be the complete metrics.

Additionally, when making inferences, we recommend using the complete audio segment. If your GPU memory is insufficient, you can slice the long audio into segments for inference and then concatenate the obtained audio segments together. But we cannot guarantee that the final metrics will be exactly the same as before.