xiph / LPCNet

Efficient neural speech synthesis
BSD 3-Clause "New" or "Revised" License
1.12k stars 295 forks source link

Why use "sigmoid" activation in MDense Layer? #164

Closed BridgetteSong closed 2 years ago

BridgetteSong commented 2 years ago

Why use sigmoid activation not softmax activation? if I don't use end2end flag, loss function is _sparse_categoricalcrossentropy, and don't use _fromlogits=True, so _y_pred_ will not encode a probability distribution.

BridgetteSong commented 2 years ago

I found author committed codes as following, but i don't understand: "Saves on the MDense/softmax computation since we only need to compute 8 values instead of 256." https://github.com/mozilla/LPCNet/commit/d24f49e346b85b5cb2c89c6dadcc913ce004fe83

BridgetteSong commented 2 years ago

understand now

vvfd87 commented 2 years ago

@BridgetteSong I'm really sorry to bothering you, but could you explain about "Representing output pdf as binary probability tree" commit?

I apply this commit, but synthesis speech quality is not good. (Synthesis speed is slightly improved.)

Using softmax and sample_from_pdf() instead of sample_mdense(), synthesis speech quality is good.

Do you have any idea?

BridgetteSong commented 2 years ago

@BridgetteSong I'm really sorry to bothering you, but could you explain about "Representing output pdf as binary probability tree" commit?

I apply this commit, but synthesis speech quality is not good. (Synthesis speed is slightly improved.)

Using softmax and sample_from_pdf() instead of sample_mdense(), synthesis speech quality is good.

Do you have any idea?

  1. author uses Hierarchical Sampling likes bit bunching, so 256 outputs can be splitted to 8 bit.
  2. every bit has only two outputs(0 or 1), so we can use a sigmoid as activation function.
  3. according to every bit output, we calculate final result.