vskadandale / vocalist

Official repository for the paper VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Other
61 stars 7 forks source link

model output range question #14

Open ketyi opened 11 months ago

ketyi commented 11 months ago

Hi @vskadandale,

The penultimate transformation https://github.com/vskadandale/vocalist/blob/d2d7d4fe2df03a9ad7b36d93cdf22dee1a6f0217/models/model.py#L83 is a tanh function. It is indeed followed by a learnt linear mapping but I would like to understand its purpose, because the ground truth is in the set of {0, 1} and the classification threshold is set to 0.5: https://github.com/vskadandale/vocalist/blob/d2d7d4fe2df03a9ad7b36d93cdf22dee1a6f0217/train_vocalist_lrs2.py#L192 .

So, why is that tanh() on the hidden vector added there?