kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.27k stars 5.32k forks source link

SAD produces consecutive segments #2707

Open stanleyguan opened 6 years ago

stanleyguan commented 6 years ago

I have been using egs/wsj/s5/steps/segmentation/detect_speech_activity.sh and a pre-trained model to do SAD on recordings. One thing I noticed is that the resulting segments may be consecutive. As an example, an utterance might be

291.30 296.91 k e c g u at email dot com

while the SAD may produce segments as

291.22 292.39
       292.39 292.99
              292.99 293.85
                     293.85 294.66
                            294.66 295.40
                                   295.40 297.05

where there are no gaps between the segments and it makes more sense in this case to have just a single combined segment.

My understanding is that this has to do with the padding of the segments (defaults to 0.2 seconds on both ends). Before padding, the segments probably had gaps in between. From an end-result point of view though I don't see much value in keeping these separate since there is no way to go back to the original segments (you can't just subtract the paddings since the padding is clipped to not overlap with adjacent segments). As it is currently implemented the padding choice is arbitrary for this situation since the earlier segment is padded first and if that extends beyond the next segment, that next segment won't be padded at the start. For ASR purposes it can also be beneficial to have a single segment so that the model can pool information together. Therefore it might make sense to just merge the segments if there are no gaps, or at least have an option to do so.

I'm happy to make a PR if this is deemed desirable.

danpovey commented 6 years ago

Hm. I think it would make sense to merge consecutive segments as long as the combined segment would be less than, say, 7 seconds long. @vimalmanohar what do you think about this?

vimalmanohar commented 6 years ago

I think this is an application specific issue. We decided not to do any post-processing. Combining segments together can also hurt the performance e.g. if they are from different sentences. Although it might help in this particular utterance to combine. You can get longer segments by tuning the graph_opts parameters.

danpovey commented 6 years ago

I think it's still interesting to have a look at the effect of this simple post-processing step though, and maybe make it an option that's off by default. I think it may be useful for quite a few scenarios.

On Fri, Sep 14, 2018 at 12:41 PM Vimal Manohar notifications@github.com wrote:

I think this is an application specific issue. We decided not to do any post-processing. Combining segments together can also hurt the performance e.g. if they are from different sentences. Although it might help in this particular utterance to combine. You can get longer segments by tuning the graph_opts parameters.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2707#issuecomment-421416323, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu195L8yp4H0to3ajmvjUh-TWCh6Xks5ua9xEgaJpZM4WpXr_ .

stanleyguan commented 6 years ago

Would a PR be appropriate for this?

danpovey commented 6 years ago

yes, that would be good.

On Mon, Sep 17, 2018 at 3:26 PM Z. Stanley Guan notifications@github.com wrote:

Would a PR be appropriate for this?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2707#issuecomment-422139440, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu3eZ1nSlmQx76FAV_4uZW3a0db6Bks5ub_dYgaJpZM4WpXr_ .

chenfuouc commented 5 years ago

May I ask which example egs used the scripe to do SAD?

vimalmanohar commented 5 years ago

https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/local/run_asr_segmentation.sh

There are also similar recipes in Babel and aspire.

On Tue, Aug 27, 2019, 08:13 chenfuouc notifications@github.com wrote:

May I ask which example egs used the scripe to do SAD?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/2707?email_source=notifications&email_token=ABABGV4ZJB4NDLP4Z2B6R5LQGUK7TA5CNFSM4FVFPL72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5HQQFY#issuecomment-525273111, or mute the thread https://github.com/notifications/unsubscribe-auth/ABABGV2CHGR6MTVCAFP44GDQGUK7TANCNFSM4FVFPL7Q .

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.