hitachi-speech / EEND

End-to-End Neural Diarization
MIT License
377 stars 59 forks source link

Smoothing the activations at the output of the transformer #42

Open zaouk opened 2 years ago

zaouk commented 2 years ago

Hey there, I was wondering if you encountered any issues related to smoothing the speaker activations predicted using the Transformer model. An encoder only transformer tends to output speaker activations which are not as smooth as the ones provided by other recurrent models (such as Bi-LSTMs and such). Did you resort to some tricks for smoothing the output activations provided by the Transformer or this was not an issue at all?