kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.27k stars 5.32k forks source link

Is speaker diarization available not ? #1710

Closed fa93hws closed 7 years ago

fa93hws commented 7 years ago

Hi I need to distinguish between speaker A and B in the audio and I did large amount of search but finding https://sourceforge.net/p/kaldi/discussion/1355348/thread/74e97aca/?limit=25 saying that kaldi didn't support it 3 years ago.

I am wondering is such a function available now? If there is ,could any one give me some page or hint or anything that I can get start with?

Thanks

vince62s commented 7 years ago

read this. https://groups.google.com/forum/#!searchin/kaldi-help/diarization%7Csort:date/kaldi-help/ROtSHHe3Z_I/BCDJXHbTAQAJ

fa93hws commented 7 years ago

Thanks @vince62s But based on the thread, I can tell the developers used to be interested in it but later they changed their mind. Do I have a correct understanding ?

danpovey commented 7 years ago

No, we are still working on segmentation and speaker diarization, but the diarization won't be ready for probably about 3 months.

On Fri, Jun 23, 2017 at 11:25 AM, Junchao Wang notifications@github.com wrote:

Thanks @vince62s https://github.com/vince62s But based on the thread, I can tell the developers used to be interested in it but later they changed their mind. Do I have a correct understanding ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-310695865, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu_sQ6LBnqXOx0BbeFMA-O0Pm6Cqeks5sG9kDgaJpZM4ODZOT .

fa93hws commented 7 years ago

@danpovey Hi many thanks for your answer.

chananshgong commented 6 years ago

any update on that?

danpovey commented 6 years ago

I checked about a week ago, and @david-ryan-snyder and @mmaciej2 told me we should be on track to merge this stuff by the end of this year. That's only a few days away now. Guys-- how is it looking on this? When @david-ryan-snyder tells me it's ready to merge, I should be able to do a very quick check and then merge it.

david-ryan-snyder commented 6 years ago

@chananshgong, there's a pull request under review at https://github.com/kaldi-asr/kaldi/pull/1894. It's functional, but we're still cleaning it up. @mmaciej2 is the main developer on this and can comment on the timeline.

mmaciej2 commented 6 years ago

@chananshgong , as David said, the current pull request is functional, and will just be cleaned up at this point. It should ideally be completed within a few days. I have been delayed a little by the holidays.

The only functional change that will be happening is the removal of the PLDA calibration code. The current method is somewhat questionable, so we would rather leave it out for now. A default threshold of 0.5 should be fine, and can be tuned by hand if it is not.

chananshgong commented 6 years ago

Thanks. Does it come with a model or does it has to be train from scratch?

On Dec 27, 2017 2:31 AM, "Matthew Maciejewski" notifications@github.com wrote:

@chananshgong https://github.com/chananshgong , as David said, the current pull request is functional, and will just be cleaned up at this point. It should ideally be completed within a few days. I have been delayed a little by the holidays.

The only functional change that will be happening is the removal of the PLDA calibration code. The current method is somewhat questionable, so we would rather leave it out for now. A default threshold of 0.5 should be fine, and can be tuned by hand if it is not.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-354029327, or mute the thread https://github.com/notifications/unsubscribe-auth/AXPZfv1H3RTEQA4WYynJLDjsvlI2kOBlks5tEY_fgaJpZM4ODZOT .

danpovey commented 6 years ago

The scripts we're checking in will be for training from scratch. If Matt and David want to release a model I certainly wouldn't object. Typically, of course, it will be best to train these things on in-domain data. Speaker-id systems can be sensitive to what data you train the PLDA matrix on, and this thing uses such a matrix, so one can expect that having some in-domain data to estimate it on would be helpful.

On Tue, Dec 26, 2017 at 11:01 PM, Hanan Shteingart <notifications@github.com

wrote:

Thanks. Does it come with a model or does it has to be train from scratch?

On Dec 27, 2017 2:31 AM, "Matthew Maciejewski" notifications@github.com wrote:

@chananshgong https://github.com/chananshgong , as David said, the current pull request is functional, and will just be cleaned up at this point. It should ideally be completed within a few days. I have been delayed a little by the holidays.

The only functional change that will be happening is the removal of the PLDA calibration code. The current method is somewhat questionable, so we would rather leave it out for now. A default threshold of 0.5 should be fine, and can be tuned by hand if it is not.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-354029327, or mute the thread https://github.com/notifications/unsubscribe-auth/ AXPZfv1H3RTEQA4WYynJLDjsvlI2kOBlks5tEY_fgaJpZM4ODZOT .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-354064319, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu-FibyZPHdM0G2gHYy30L9_o7RAaks5tEethgaJpZM4ODZOT .

chananshgong commented 6 years ago

Thanks, can you share a link on the Bayesian Variation method for resegmentation mentioned in the pull request?

On Wed, Dec 27, 2017 at 9:49 AM, Daniel Povey notifications@github.com wrote:

The scripts we're checking in will be for training from scratch. If Matt and David want to release a model I certainly wouldn't object. Typically, of course, it will be best to train these things on in-domain data. Speaker-id systems can be sensitive to what data you train the PLDA matrix on, and this thing uses such a matrix, so one can expect that having some in-domain data to estimate it on would be helpful.

On Tue, Dec 26, 2017 at 11:01 PM, Hanan Shteingart < notifications@github.com

wrote:

Thanks. Does it come with a model or does it has to be train from scratch?

On Dec 27, 2017 2:31 AM, "Matthew Maciejewski" <notifications@github.com

wrote:

@chananshgong https://github.com/chananshgong , as David said, the current pull request is functional, and will just be cleaned up at this point. It should ideally be completed within a few days. I have been delayed a little by the holidays.

The only functional change that will be happening is the removal of the PLDA calibration code. The current method is somewhat questionable, so we would rather leave it out for now. A default threshold of 0.5 should be fine, and can be tuned by hand if it is not.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-354029327 , or mute the thread https://github.com/notifications/unsubscribe-auth/ AXPZfv1H3RTEQA4WYynJLDjsvlI2kOBlks5tEY_fgaJpZM4ODZOT .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-354064319, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJVu- FibyZPHdM0G2gHYy30L9_o7RAaks5tEethgaJpZM4ODZOT

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/issues/1710#issuecomment-354069851, or mute the thread https://github.com/notifications/unsubscribe-auth/AXPZfnHyLAzBgFa0-JzPUJiyJQ7R9Jv-ks5tEfaegaJpZM4ODZOT .

-- Hanan Shteingart, PhD | Data Scientist | Gong.io P: +972-54-2271572 <+972%2052-405-6805> M: hanan.shteingart@gong.io W: www.gong.io About Gong: Gong, an AI-based language tool to help sales and customer service reps, nabs $20M https://techcrunch.com/2017/07/12/gong-an-ai-based-language-tool-to-help-sales-and-customer-service-reps-nabs-20m/ Love Gong? Share the love and help us grow https://www.gong.io/become-gong-ambassador/?utm_source=employee-email&utm_medium=email&utm_content=signature .

david-ryan-snyder commented 6 years ago

@chananshgong: http://speech.fit.vutbr.cz/software/vb-diarization-eigenvoice-and-hmm-priors