Feeding FB audio outputs WB

ChamMoradi commented 2 years ago

Running the codec with full-band input (48 khz sr):

bazel-bin/encoder_main --input_path=path/to/fullband-audio/.wav --output_dir= dir/to/bs_output --bitrate=6000 bazel-bin/decoder_main --encoded_path= dir/to/bs_output/.lyra --output_dir= dir/to/output --bitrate=6000

Is giving me WB output samplerate 16 khz. Is it about the bitrate? with lower bitrates you lower the bandwidth? if so, what's the bitrate for SWB and FB output? thanks!

astorus-goog commented 2 years ago

Hi ChamMoradi, There is an additional flag for the decoder you can use, sample_rate_hz.

bazel-bin/decoder_main --encoded_path= dir/to/bs_output/.lyra --output_dir= dir/to/output --bitrate=6000 --sample_rate_hz=48000

Note that regardless of bitrate, the model internally operates at 16 kHz sample rate, then performs upsampling if the requested output sample rate is greater than 16 kHz.

ChamMoradi commented 2 years ago

Hello Astorus,

Thanks for tour prompt response. I have few other similar questions which I couldn't find any answer in the repo. For example, what sample rates are supported for the input audio. Is there any document in which I can find all available flags and the infos?

Another note, this document: https://ai.googleblog.com/2021/08/soundstream-end-to-end-neural-audio.html seems to be pretty old where the demo audio items are in a bitrate that is no longer supported. I would like to have sanity check to see if I get the right output from the code.

astorus-goog commented 2 years ago

We have the supported bitrates and sample rates documented in the lyra_encoder header file: https://github.com/google/lyra/blob/main/lyra_encoder.h#L48 And just the sample rates in the lyra_decoder header: https://github.com/google/lyra/blob/main/lyra_decoder.h#L45

ChamMoradi commented 2 years ago

thanks! could also point me where I can find uncoded audio items and corresponding outputs from your codec? for the sake of being sure I am running your codec correctly.

astorus-goog commented 2 years ago

Happy to help! For validation files, you can see the examples on our open source blogpost under the 'Higher Quality' header: https://opensource.googleblog.com/2022/09/lyra-v2-a-better-faster-and-more-versatile-speech-codec.html

Note that the output from the current repository won't be numerically equivalent to the output from the blogpost since we have updated the model since that time, but they should be equivalent in a perceptual sense.

ChamMoradi commented 2 years ago

Great! I am thinking of running a listening test including your codec, codec recently released from Meta and perhaps EVS. I wonder your codec is it more robust on noisy in clean out or clean in clean out. Also is there a possibility to disable denoiser to have a better observation on the coding artifacts?

astorus-goog commented 2 years ago

The encoder main includes a flag to enable preprocessing, but it is disabled by default: https://github.com/google/lyra/blob/main/encoder_main.cc#L37 This is the WebRTCPreprocessor module, not one we have custom built.

When you reference 'noisy in clean out' what exactly are you referring to? Currently the model does not do any enhancement.

ChamMoradi commented 1 year ago

Sorry for a late reply. From old documents I understood SoundStream (Lyra V2) is a codec that is able to perform noise removal task. That is, there's a possibility to have a noisy speech as an input and expect a clean speech out of the codec. Please, correct me if I am wrong.

astorus-goog commented 1 year ago

That was something we explored in the research version in the soundstream blog post . However it has not been included in LyraV2, as jointly compressing and enhancing speech is difficult for the lower capacity model used in LyraV2.

ChamMoradi commented 1 year ago

Thanks for the clarification

google / lyra

Feeding FB audio outputs WB #106