philayres / babble-rnn

babble-rnn is a research project in the use of machine learning to generate new speech by modelling human speech audio, without any intermediate text or word representations. The idea is to learn to speak through imitation, much like a baby might.
http://babble-rnn.consected.com
Apache License 2.0
35 stars 10 forks source link

Codec2 c2enc bitrates must be 3200 or lower #1

Open adamhrv opened 7 years ago

adamhrv commented 7 years ago

Impressive results on your tech post. I'm trying to reimplement your experiment and make a babble generator, but am having an issue with what seems to be the Codec2 library used in your workflow, namely c2enc and c2dec.

Following the instruction in your tech post, I've installed Codec2 library. But when generating the encodings (or decodings) with mp32c2.sh, c2enc/dec throw an error:

"Error in mode: ~/datasets/audio/dickens/mp3/TaleOfTwoCities_pt01-8k.raw. Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"

Following the instructions at https://github.com/freedv/codec2, this was fixed by adding 3200 in front of the filenames: /path/to/c2enc 3200 $fn-8k.raw $fn.c2cb charbits

But now the audio output conversion from c2towav.sh doesn't seem to produce the correct output because, possibly because it's down sampled to 3200 then back up to 8000?

/path/to/c2dec 3200 $fn $fn.raw charbits

Which version of Codec2 are you using to encode/decode at 8000 bitrate?

If the audio needs to down sampled to 3200 bitrate for training, how much would that affect quality of the output?

I tried:

All throw the same error when trying to use 8000: Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"

Also, I had to change utils.output_file.write(self.sample(frame)) to utils.output_file.write(str(self.sample(frame))) to fix error in trying to write numpy array as txt

philayres commented 7 years ago

Adam, A very quick response to get started. I'll get back with more detail.

I reworked the codec 2 code for both 3200 and 1300 bit rates. Currently 3200 is working better. I have not posted the code yet, as the base is Subversion. Maybe easiest if I post the binaries. Only Linux though.

Will the 3200 rate codec work for you? If so, I'll try and get to it tomorrow or early next week.

On Sat, 5 Aug 2017, 2:10 pm Adam Harvey, notifications@github.com wrote:

Impressive results on your tech post. I'm trying to reimplement your experiment and make a babble generator, but am having an issue with what seems to be the Codec2 library used in your workflow, namely c2enc and c2dec.

Following the instruction in your tech post, I've installed Codec2 http://www.rowetel.com/?page_id=452 library. But when generating the encodings (or decodings) with mp32c2.sh, c2enc/dec throw an error:

"Error in mode: ~/datasets/audio/dickens/mp3/TaleOfTwoCities_pt01-8k.raw. Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"

Following the instructions at https://github.com/freedv/codec2, this was fixed by adding 3200 in front of the filenames: /path/to/c2enc 3200 $fn-8k.raw $fn.c2cb charbits

But now the audio output conversion from c2towav.sh doesn't seem to produce the correct output because, possibly because it's down sampled to 3200 then back up to 8000?

/path/to/c2dec 3200 $fn $fn.raw charbits

Which version of Codec2 are you using to encode/decode at 8000 bitrate?

If the audio needs to down sampled to 3200 bitrate for training, how much would that affect quality of the output?

I tried:

All throw the same error when trying to use 8000: Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"

Also, I had to change utils.output_file.write(self.sample(frame)) to utils.output_file.write(str(self.sample(frame))) to fix error in trying to write numpy array as txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/philayres/babble-rnn/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAR_2t3F57bget1WNzhgPzSCB8UkXrdWks5sVGmugaJpZM4OudPm .

adamhrv commented 7 years ago

Also using Linux. Ok. I'll try the workflow again with modified Codec 2 code. Thanks.

Or, would using c2enc/dec prefixed with 3200 (or 1300) work? /path/to/c2enc 3200 $fn-8k.raw $fn.c2cb charbits and /path/to/c2dec 3200 $fn $fn.raw charbits Assuming I also run the all the conversions in mp32c2.sh with the same bitrate.

Possible to share your scripts (or generator settings) used for creating the samples on your tech post? Those sound great. Or is that what's already described in generate_audio.ipynb?

philayres commented 7 years ago

I have added a codec2 directory to the v2 branch. This contains codec2 binaries that should run in 3200 bit rate mode. I just tested it here, and generated a 3200 rate file: https://github.com/philayres/babble-rnn/blob/v2/generated/d2-3200-v1-1-1-3200.wav

The configuration for this is in https://github.com/philayres/babble-rnn/tree/v2/out/d2-3200-v1-1-1

You can restart where I left off if you load model-1910.h5 by editing config.json; just change this entry:

  "start_iteration": 1910

Then run

./learn.sh d2-3200-v1-1-1

If you look at the Jupyter notebook the model definition it shows will give you an idea of what is actually being trained.

I'm not sure if this makes sense. Feel free to ask questions.