Open yd11111 opened 1 year ago
The encoder automatically resamples the input, and for some reason the decoder does the same to the output, to 16khz by default. I modified it to work with 24khz, and got different results than 16. Maybe because of the way I did it, the bitrate increased as well
You right, the TFLite models only support 16kHz. The Lyra API supports 8kHz, 16kHz, 32kHz and 48kHz, resampling to 16kHz at the encoder and from 16kHz at the decoder if needed. The desired bitrate can be set completely independently from the sample rate. The supported bitrates are 3.2kbps, 6kbps and 9.2kbps.
Thanks for your answer, now I figure it out.
@aluebs but is it possible for lyra to support stereo audio format also?
It is, you can set the number of channels in lyra_config.cc I think. Doing so doubles the file size (and encoding time?).
@pinilpypinilpy I was able to find the required variable as kNumChannels
, its value is set in lyra_confiig.cc
as 1.
However in lyra_config.h
they are used as extern int
using the namespace codec
with the following comment
This file is reserved for non-configurable values needed by both the decoder
and encoder. What those non-configurable values are depends on which project
is chosen to be compiled. As a result, a struct holding the configuration
data is defined to ensure each new target added and each new configuration
element is explicitly defined.
So I am not sure if one should directly change the parameters in that file. There are other parameters also kNumFeatures
kNumMelBins
kFrameRate
kOverlapFactor
, I do not think others need to be changed right?
At this time, Lyra doesn't support stereo.
@berserker1 you only need to change the other values if you're using a different sampling rate. If you change kNumChannels to 2 and recompile, your input file will have to be stereo, and the decoded file will also be.
It isn't technically supported though. If you want to play around, I forked Lyra and added support for other sampling rates and bitrate presets, as well as stereo without needing to recompile: https://github.com/pinilpypinilpy/lyra-variable
However, the devs disabled those things for a reason, so YMMV
@pinilpypinilpy Yes I followed exactly what you said, changing the variable and inputing a stereo file worked fine (it encoded it and decoded it smoothly), thanks for sharing your forked repo!
As a novice myself I do not quite get why this small feature is not there and it is disabled?
Referring to the original Soundstream article, Soundstream should be trained on 24kHz data. I would like to know what sample rate wavs these models released in lyraV2 (soundstream_encoder.tflite; quantizer.tflite; lyragan.tflite) were trained on. Can these models also support processing 24kHz wavs? Could these models be used on 24kHz wavs to do some interesting experiments similar to another Google work AudioLM.
I found that the existing models seem to be processing 16kHz wavs. However, I found in 48 line in lyra_encoder.h the supported sample rates are not only 16000, but also 8000, 32000, and 48000. This makes me confused. Different sample rate means that the fixed 320 samples vary in the different time span. I'm not quite sure if this fixed soundstream_encoder can directly handle data of different sample rates. Because given 46 4bit quantizers, the encoded data is not the supported bit rates (9.2kbps) mentioned in the API doc. Actually, I use the three released models to encode, quantize and decode a 16Khz and a 24Khz wav with the same content, the two decode waves sound like the same. Due to the limitation of the num of test examples, I am not sure about the recovery quality. Can anyone explain this? Much thanks.