Open adamhrv opened 7 years ago
Adam, A very quick response to get started. I'll get back with more detail.
I reworked the codec 2 code for both 3200 and 1300 bit rates. Currently 3200 is working better. I have not posted the code yet, as the base is Subversion. Maybe easiest if I post the binaries. Only Linux though.
Will the 3200 rate codec work for you? If so, I'll try and get to it tomorrow or early next week.
On Sat, 5 Aug 2017, 2:10 pm Adam Harvey, notifications@github.com wrote:
Impressive results on your tech post. I'm trying to reimplement your experiment and make a babble generator, but am having an issue with what seems to be the Codec2 library used in your workflow, namely c2enc and c2dec.
Following the instruction in your tech post, I've installed Codec2 http://www.rowetel.com/?page_id=452 library. But when generating the encodings (or decodings) with mp32c2.sh, c2enc/dec throw an error:
"Error in mode: ~/datasets/audio/dickens/mp3/TaleOfTwoCities_pt01-8k.raw. Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"
Following the instructions at https://github.com/freedv/codec2, this was fixed by adding 3200 in front of the filenames: /path/to/c2enc 3200 $fn-8k.raw $fn.c2cb charbits
But now the audio output conversion from c2towav.sh doesn't seem to produce the correct output because, possibly because it's down sampled to 3200 then back up to 8000?
/path/to/c2dec 3200 $fn $fn.raw charbits
Which version of Codec2 are you using to encode/decode at 8000 bitrate?
If the audio needs to down sampled to 3200 bitrate for training, how much would that affect quality of the output?
I tried:
- Codec2-0.6 library from http://www.rowetel.com/?page_id=452
- the codec-dev branch
- https://github.com/freedv/codec2
All throw the same error when trying to use 8000: Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"
Also, I had to change utils.output_file.write(self.sample(frame)) to utils.output_file.write(str(self.sample(frame))) to fix error in trying to write numpy array as txt
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/philayres/babble-rnn/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAR_2t3F57bget1WNzhgPzSCB8UkXrdWks5sVGmugaJpZM4OudPm .
Also using Linux. Ok. I'll try the workflow again with modified Codec 2 code. Thanks.
Or, would using c2enc/dec prefixed with 3200 (or 1300) work?
/path/to/c2enc 3200 $fn-8k.raw $fn.c2cb charbits
and
/path/to/c2dec 3200 $fn $fn.raw charbits
Assuming I also run the all the conversions in mp32c2.sh with the same bitrate.
Possible to share your scripts (or generator settings) used for creating the samples on your tech post? Those sound great. Or is that what's already described in generate_audio.ipynb?
I have added a codec2 directory to the v2 branch. This contains codec2 binaries that should run in 3200 bit rate mode. I just tested it here, and generated a 3200 rate file: https://github.com/philayres/babble-rnn/blob/v2/generated/d2-3200-v1-1-1-3200.wav
The configuration for this is in https://github.com/philayres/babble-rnn/tree/v2/out/d2-3200-v1-1-1
You can restart where I left off if you load model-1910.h5 by editing config.json; just change this entry:
"start_iteration": 1910
Then run
./learn.sh d2-3200-v1-1-1
If you look at the Jupyter notebook the model definition it shows will give you an idea of what is actually being trained.
I'm not sure if this makes sense. Feel free to ask questions.
Impressive results on your tech post. I'm trying to reimplement your experiment and make a babble generator, but am having an issue with what seems to be the Codec2 library used in your workflow, namely c2enc and c2dec.
Following the instruction in your tech post, I've installed Codec2 library. But when generating the encodings (or decodings) with mp32c2.sh, c2enc/dec throw an error:
"Error in mode: ~/datasets/audio/dickens/mp3/TaleOfTwoCities_pt01-8k.raw. Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"
Following the instructions at https://github.com/freedv/codec2, this was fixed by adding 3200 in front of the filenames:
/path/to/c2enc 3200 $fn-8k.raw $fn.c2cb charbits
But now the audio output conversion from c2towav.sh doesn't seem to produce the correct output because, possibly because it's down sampled to 3200 then back up to 8000?
/path/to/c2dec 3200 $fn $fn.raw charbits
Which version of Codec2 are you using to encode/decode at 8000 bitrate?
If the audio needs to down sampled to 3200 bitrate for training, how much would that affect quality of the output?
I tried:
All throw the same error when trying to use 8000:
Must be 3200, 2400, 1600, 1400, 1300, 1200 or 450"
Also, I had to change
utils.output_file.write(self.sample(frame))
toutils.output_file.write(str(self.sample(frame)))
to fix error in trying to write numpy array as txt