Open stanleyguan opened 6 years ago
Hm. I think it would make sense to merge consecutive segments as long as the combined segment would be less than, say, 7 seconds long. @vimalmanohar what do you think about this?
I think this is an application specific issue. We decided not to do any post-processing. Combining segments together can also hurt the performance e.g. if they are from different sentences. Although it might help in this particular utterance to combine. You can get longer segments by tuning the graph_opts parameters.
I think it's still interesting to have a look at the effect of this simple post-processing step though, and maybe make it an option that's off by default. I think it may be useful for quite a few scenarios.
On Fri, Sep 14, 2018 at 12:41 PM Vimal Manohar notifications@github.com wrote:
I think this is an application specific issue. We decided not to do any post-processing. Combining segments together can also hurt the performance e.g. if they are from different sentences. Although it might help in this particular utterance to combine. You can get longer segments by tuning the graph_opts parameters.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2707#issuecomment-421416323, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu195L8yp4H0to3ajmvjUh-TWCh6Xks5ua9xEgaJpZM4WpXr_ .
Would a PR be appropriate for this?
yes, that would be good.
On Mon, Sep 17, 2018 at 3:26 PM Z. Stanley Guan notifications@github.com wrote:
Would a PR be appropriate for this?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2707#issuecomment-422139440, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu3eZ1nSlmQx76FAV_4uZW3a0db6Bks5ub_dYgaJpZM4WpXr_ .
May I ask which example egs used the scripe to do SAD?
https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/run_asr_segmentation.sh
There are also similar recipes in Babel and aspire.
On Tue, Aug 27, 2019, 08:13 chenfuouc notifications@github.com wrote:
May I ask which example egs used the scripe to do SAD?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2707?email_source=notifications&email_token=ABABGV4ZJB4NDLP4Z2B6R5LQGUK7TA5CNFSM4FVFPL72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5HQQFY#issuecomment-525273111, or mute the thread https://github.com/notifications/unsubscribe-auth/ABABGV2CHGR6MTVCAFP44GDQGUK7TANCNFSM4FVFPL7Q .
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
I have been using
egs/wsj/s5/steps/segmentation/detect_speech_activity.sh
and a pre-trained model to do SAD on recordings. One thing I noticed is that the resulting segments may be consecutive. As an example, an utterance might bewhile the SAD may produce segments as
where there are no gaps between the segments and it makes more sense in this case to have just a single combined segment.
My understanding is that this has to do with the padding of the segments (defaults to 0.2 seconds on both ends). Before padding, the segments probably had gaps in between. From an end-result point of view though I don't see much value in keeping these separate since there is no way to go back to the original segments (you can't just subtract the paddings since the padding is clipped to not overlap with adjacent segments). As it is currently implemented the padding choice is arbitrary for this situation since the earlier segment is padded first and if that extends beyond the next segment, that next segment won't be padded at the start. For ASR purposes it can also be beneficial to have a single segment so that the model can pool information together. Therefore it might make sense to just merge the segments if there are no gaps, or at least have an option to do so.
I'm happy to make a PR if this is deemed desirable.