Open ChamMoradi opened 2 years ago
Hi ChamMoradi, There is an additional flag for the decoder you can use, sample_rate_hz.
bazel-bin/decoder_main --encoded_path= dir/to/bs_output/.lyra --output_dir= dir/to/output --bitrate=6000 --sample_rate_hz=48000
Note that regardless of bitrate, the model internally operates at 16 kHz sample rate, then performs upsampling if the requested output sample rate is greater than 16 kHz.
Hello Astorus,
Thanks for tour prompt response. I have few other similar questions which I couldn't find any answer in the repo. For example, what sample rates are supported for the input audio. Is there any document in which I can find all available flags and the infos?
Another note, this document: https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html seems to be pretty old where the demo audio items are in a bitrate that is no longer supported. I would like to have sanity check to see if I get the right output from the code.
We have the supported bitrates and sample rates documented in the lyra_encoder header file: https://github.com/google/lyra/blob/main/lyra_encoder.h#L48 And just the sample rates in the lyra_decoder header: https://github.com/google/lyra/blob/main/lyra_decoder.h#L45
thanks! could also point me where I can find uncoded audio items and corresponding outputs from your codec? for the sake of being sure I am running your codec correctly.
Happy to help! For validation files, you can see the examples on our open source blogpost under the 'Higher Quality' header: https://opensource.googleblog.com/2022/09/lyra-v2-a-better-faster-and-more-versatile-speech-codec.html
Note that the output from the current repository won't be numerically equivalent to the output from the blogpost since we have updated the model since that time, but they should be equivalent in a perceptual sense.
Great! I am thinking of running a listening test including your codec, codec recently released from Meta and perhaps EVS. I wonder your codec is it more robust on noisy in clean out or clean in clean out. Also is there a possibility to disable denoiser to have a better observation on the coding artifacts?
The encoder main includes a flag to enable preprocessing, but it is disabled by default: https://github.com/google/lyra/blob/main/encoder_main.cc#L37 This is the WebRTCPreprocessor module, not one we have custom built.
When you reference 'noisy in clean out' what exactly are you referring to? Currently the model does not do any enhancement.
Sorry for a late reply. From old documents I understood SoundStream (Lyra V2) is a codec that is able to perform noise removal task. That is, there's a possibility to have a noisy speech as an input and expect a clean speech out of the codec. Please, correct me if I am wrong.
That was something we explored in the research version in the soundstream blog post . However it has not been included in LyraV2, as jointly compressing and enhancing speech is difficult for the lower capacity model used in LyraV2.
Thanks for the clarification
Running the codec with full-band input (48 khz sr):
bazel-bin/encoder_main --input_path=path/to/fullband-audio/.wav --output_dir= dir/to/bs_output --bitrate=6000 bazel-bin/decoder_main --encoded_path= dir/to/bs_output/.lyra --output_dir= dir/to/output --bitrate=6000
Is giving me WB output samplerate 16 khz. Is it about the bitrate? with lower bitrates you lower the bandwidth? if so, what's the bitrate for SWB and FB output? thanks!