google / lyra

A Very Low-Bitrate Codec for Speech Compression
Apache License 2.0
3.84k stars 355 forks source link

What is the principle behind getting 8 bytes per frame of audio? #154

Closed TangYuFan closed 2 weeks ago

TangYuFan commented 3 weeks ago

I have reviewed the input and output of three tflite models, such as: soundstream_encoder.tflite: tensor: float32[1,320] => tensor: float32[1,1,64] quantizer.tflite encode: tensor: float32[1,1,64] => tensor: int32[46,1,1] quantizer.tflite decode: tensor: int32[46,1,1] => tensor: float32[1,1,64] lyragan.tflite: tensor: float32[1,1,64] => tensor: float32[1,320] I have reviewed the code in the residual-vector_quantizer.cc section, but I don't understand how int32 [46,1,1] is converted into 8-byte encoded data. Does anyone know the principle behind it?