xiph / LPCNet

Efficient neural speech synthesis
BSD 3-Clause "New" or "Revised" License
1.12k stars 295 forks source link

4x8 block computation in vec_avx.h #185

Open Jo0o0Hyung opened 2 years ago

Jo0o0Hyung commented 2 years ago

@jmvalin , Thank you for sharing your code in Github. I have a question about 8x4 block computation in vec_avx.h (My question is based on lpcnet_efficiency branch after reading the paper, NEURAL SPEECH SYNTHESIS ON A SHOESTRING: IMPROVING THE EFFICIENCY OF LPCNET)

https://github.com/xiph/LPCNet/blob/lpcnet_efficiency/src/vec_avx.h#L788 In attached link, vector_ps_to_epi8 function is called and this function converts float type state (_x) into unsigned char type state (x). And for making signed char to unsigned char, the scalar value 127 is added. I understand that this operation is for applying _mm256_maddubs_epi16 to vxj and vw.

But, In my opinion, after the operation, there should be some kind of 'compensation' part that needs to subtract the added value, 127. (Because state of GRU is not in range of unsigned char in training step.) So, for example, I think the following line should be added after the aforementioned operation line.

// Before
tmp = _mm256_maddubs_epi16(vxj, vw);
tmp = _mm256_madd_epi16(tmp, ones);

// After
tmp = _mm256_maddubs_epi16(vxj, vw);
tmp = _mm256_sub_epi16(tmp, _mm256_maddubs_epi16(const_127i, vw));
tmp = _mm256_madd_epi16(tmp, ones);

In summary, I reckon that 127 should be subtracted during the sgemv 8x4 operation to compensate the added value of 127 in vector_ps_to_epi8 and I wonder if I have missed anything.

In addition, this is far from with what I said above, I wonder why Gaussian Noise added between GRU_A and GRU_B.

gru_out1, _ = rnn(rnn_in)
gru_out1 = GaussianNoise(.005)(gru_out1)
gru_out2, _ = rnn2(Concatenate()([gru_out1, rep(cfeat)]))

Thanks.

jmvalin commented 2 years ago

Regarding the 127 offset, you're right that there needs to be a computation. That's done offline by changing the bias values we add at the end. See the "bias" vs "subias" values in the model ("su" is for "signed*unsigned multiply). As for the GaussianNoise, it kinda simulates the quantization noise we add to the activation when going from float to int8.

Jo0o0Hyung commented 2 years ago

Regarding the 127 offset, you're right that there needs to be a computation. That's done offline by changing the bias values we add at the end. See the "bias" vs "subias" values in the model ("su" is for "signed*unsigned multiply). As for the GaussianNoise, it kinda simulates the quantization noise we add to the activation when going from float to int8.

Thank you for the reply. :-) Thanks to your comment, I understand the role of "subias" in dump_lpcnet.py & nnet.c script. Also, I checked that the wav is well synthesized without the line, tmp = _mm256_sub_epi16(tmp, _mm256_maddubs_epi16(const_127i, vw)); by using su_bias.