Closed seungwonpark closed 5 years ago
Thank you for your interests!
The reasons are:
REF
[1] Zhang, Xiao-Lei, and DeLiang Wang. "Boosting contextual information for deep neural network based voice activity detection." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.2 (2016): 252-264.
Thank you very much for sharing your insights! I'll also refer to that reference. Shall we close the issue? Surely, I won't mind if you leave it open.
Hi, recently I've been looking for deep-learning based VAD models and some googling brought me here. Thanks for open-sourcing your model! :)
My question is: why was MRCG used as an input feature?
To the best of my knowledge, STFT based mel-spectrograms (or linear-scale magnitudes, whatever) have been widely used as an input feature of recent deep-learning based acoustic models. Are there any strengths that MRCG have in VAD model, compared to other acoustic features like mel-spectrograms?