Can we cahnge the smapling frequency of audiowaves for training.

santi-pdp / segan

Speech Enhancement Generative Adversarial Network in TensorFlow

MIT License

808 stars 281 forks source link

Can we cahnge the smapling frequency of audiowaves for training. #52

Open imran7778 opened 6 years ago

imran7778 commented 6 years ago

Dear @santi-pdp

As this model down-sample any input audio-wave to 16K. For my case i want to train this model for dataset of waves having 100k sampling/sec. please guide me how to change the code for required problem.

My input training and testing waves are sampled at 100k. and i don't want to down-sample my dataset to 16k as this model did before training.

Please guide.

Regards Imran Ahmed

qianqian0116 commented 6 years ago

I have the same problom too,my dataset is 8K, is it necessary to down-sample to 16K? can someone gives the detail theory? Please guide. qian

santi-pdp commented 6 years ago

You should change the make_tfrecords script, which is the one making the chunks of waveforms with the assumption of 16kHz (https://github.com/santi-pdp/segan/blob/master/make_tfrecords.py#L43). I haven't ever tried frequencies above 16kHz for the chunks would get larger to have the same receptive field, but the same training scripts should work after you just change the "chunk-maker" script to generate tfrecords.

qianqian0116 commented 6 years ago

Thank you for your answer! I will try to train the model using 8K dataset without sample rate judgement in make_tfrecords.py . The purpose of my training is to improve the perfomace of speaker recognition under reality environment, hope it works!

qian

qianqian0116 commented 6 years ago

Hi@santi-pdp I have another question,before testing, the form of wav is : sample rate:16000khz precision :16 bit sample encoding :16 bit signed integer PCM,

after test, the form is : sample rate:16000khz precision :25 bit sample encoding :32 bit floating point PCM, I wonder what caused these changes? will these changes make differences in wav?

please guide! qian

santi-pdp commented 6 years ago

This is because of the wavfile.write function writing the normalized [-1, 1] waveform with this encoding instead of re-scaling to 16-bit precision. You can instead use soundfile library such that it will directly write 16 bit PCM if you want, or use sox <infile.wav> -r 16k -b 16 <outfile.wav> to convert the already written wav.

qianqian0116 commented 6 years ago

Hi @santi-pdp Thank you very much for your answer! now I can train a 8K model at your suggestion!

I found that 8K model can be trained faster than 16K model, is it because of the canvas_size is 2**14, so I wonder can I change the canvas_size more little to train a better 8K model?

In a word, thanks a lot for your guidance! Best wishes! qian