vbelz / Speech-enhancement

Deep learning for audio denoising
MIT License
620 stars 124 forks source link

General questions #5

Open betegon opened 4 years ago

betegon commented 4 years ago

HI @vbelz ,

First of all, thankyou for your work, I have tried to denoise some audio and it worked so good, but I have a few questions

Quoted from README:

Specify how many frames you want to create as nb_samples in args.py (or pass it as argument from the terminal) I let nb_samples=50 by default for the demo but for production I would recommend having 40 000 or more.

1. What is exactly nb_samples?

2. Are the weights provided by you from nb_samples=50?

3. Should I resample audio to be 8KHz for denoising or is it done inside the network? Also, should I do it for training?

4. I want to twerk it to be a better denoiser for background noise rather than specific sounds. What are your thoughts on this? I have a dataset with clean samples and background noise samples. Will it work if it train it? Which hyperparameter should I use?

Thank you so much and sorry for bothering you!

vbelz commented 4 years ago

Hi Miguel,

thanks, good to hear it was useful for you!

1) In creation mode, the audio is splitter into several time windows (slightly above one second). Each window (will be converted to 2D spectrogram) and will be a sample for training. nb_samples is simply the number of windows used.

2) No! As I described in the readme/article, training was using 10h of sounds! It required the use of GPU and has been trained on colab. nb_samples=50 was only for example purpose, and you can run it with CPU only.

3) When reading audio it is resampled to 8kHz (see data creation or data prediction), so yes before going to the network. Have a look for example at the function: audio_files_to_numpy

4) Sure, you should try to train it. My recommendation would be to gather enough data to train. Be aware as well that it will require GPU (see google colab or other cloud alternatives). Additionally, the global scaling to apply for input and output might differ (it is expected to have a distribution of values between -1 and 1). In term of hyperparameters, try to play with Unet parameters such as size_filter_in, kernel_init, activation_layer.

Kind regards,

Vincent

Le jeu. 5 mars 2020 à 09:46, Miguel Betegón notifications@github.com a écrit :

HI @vbelz https://github.com/vbelz ,

First of all, thankyou for your work, I have tried to denoise some audio and it worked so good, but I have a few questions

Quoted from README:

Specify how many frames you want to create as nb_samples in args.py (or pass it as argument from the terminal) I let nb_samples=50 by default for the demo but for production I would recommend having 40 000 or more.

1. What is exactly nb_samples?

2. Are the weights provided by you from nb_samples=50?

3. Should I resample audio to be 8KHz for denoising or is it done inside the network? Also, should I do it for training?

4. I want to twerk it to be a better denoiser for background noise rather than specific sounds. What are your thoughts on this? I have a dataset with clean samples and background noise samples. Will it work if it train it? Which hyperparameter should I use?

Thank you so much and sorry for bothering you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vbelz/Speech-enhancement/issues/5?email_source=notifications&email_token=AJHCRMCUBKAFDCBIV2CBZLDRF6NKLA5CNFSM4LCJDTO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ISYRPFQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJHCRMGBBZF6S4LUSKWVFU3RF6NKLANCNFSM4LCJDTOQ .

betegon commented 4 years ago

Hi @vbelz , Thank you for your kind and quick response.

I have been working on creating the data necessary to train it, and I have a few more questions (sorry for bothering you).

I have approx. 10h of audio, and when I am about to create the dataset, I end up with the following error, caused in the function numpy_audio_to_matrix_spectrogram:

m_mag_db = np.zeros((nb_audio, dim_square_spec, dim_square_spec)) MemoryError: Unable to allocate array with shape (37028, 257, 257) and data type float64

Also, I have the following questions:

1.) I have used dimensions of 256x256, as I have downsampled the audios to 16KHz instead of 8KH. The window I have used is of 16128KHz, which is slightly more than one second. Do you thinks this is a correct approach? I mean, your window was of 64Hz more than a second for 8KHz, so I scaled it to 16KHz. Also, the problem I am facing is that the size I get from preparing the dataset is 256x257 (the dimensions that librosa.stft returns). I don't know why isn't it 256x256, as my parameters are: hop_length_fft = 63, n_fft = 510, frame_length = 16128 and hop_length_frame = 16128. This gives a result of 16128/63 = 256, so I don't know where it gets that number of 257 columns.

2.) Why the window should be in between a second? Will it improve its performance if it is smaller / bigger?

3.) Do you think there will be any mayor loss of performance by decreasing precision to 32bit (i.e. numpy datatype = 'float32')

4.) It looks like you are cropping all audios as you don't include the last window of them. Therefore, I have added zero padding to the end of each audio to achieve the window size. What do you think about this?

5.) I have concatenated the audios one after another so they keep the audio structure. Is there a special reason to create a random order? you use the function blend_noise_randomly to do this.

6.) What's the difference between frame_length and hop_frame_length? I think they refer to the same parameter: the sliding window size for STFT, which is by definition the frame_length.

Thanks a lot for your time and effort

cheers.

Vishesh813 commented 4 years ago

@vbelz please help on this.

vaishalibhardwaj commented 3 years ago

@vbelz please help on this.

Hello Vishesh, As I am new to this project could you guide me on how to get till till the denoised output. Note: I do not have GPUs on my computer

Vishesh813 commented 3 years ago

Hi vbelz, How can I help her?

On Wed, 29 Jul, 2020, 12:05 AM Vaishali Bhardwaj, notifications@github.com wrote:

@vbelz https://github.com/vbelz please help on this.

Hello Vishesh, As I am new to this project could you guide me on how to get till till the denoised output. Note: I do not have GPUs on my computer

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vbelz/Speech-enhancement/issues/5#issuecomment-665207604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKEUTTKLSAFDD5SNTU6K4WLR54LABANCNFSM4LCJDTOQ .