hitachi-speech / EEND

End-to-End Neural Diarization
MIT License
368 stars 57 forks source link

Smoothing the activations at the output of the transformer #42

Open zaouk opened 1 year ago

zaouk commented 1 year ago

Hey there, I was wondering if you encountered any issues related to smoothing the speaker activations predicted using the Transformer model. An encoder only transformer tends to output speaker activations which are not as smooth as the ones provided by other recurrent models (such as Bi-LSTMs and such). Did you resort to some tricks for smoothing the output activations provided by the Transformer or this was not an issue at all?