idnavid / spkr_diarization

Alveo Speaker Diarization project - Navid's version
2 stars 5 forks source link

error while resegmentation #2

Closed mkingupta closed 7 years ago

mkingupta commented 7 years ago

sgmestim error -- no training data available sgmestim error -- no training data available sgmestim error -- no training data available sgmestim error -- no training data available /home/manish/spkr_diarization/backup/code/resegment.py:83: RuntimeWarning: invalid value encountered in divide trans_mat = trans_mat/float(N) [[ nan nan nan nan] [ nan nan nan nan] [ nan nan nan nan] [ nan nan nan nan]] gmm_read() -- cannot open file ./out_dir/C3 hmm_read() -- cannot load GMM from file ./out_dir/C3 (state 1, line 2, file .//out_dir//diarizationExample_24502_hmm.txt) sviterbi error -- cannot read hmm file .//out_dir//diarizationExample_24502_hmm.txt

idnavid commented 7 years ago

You need to train a UBM offline for this to work. See main function in gmm.py

mkingupta commented 7 years ago

i tried training my own UBM

[Errno 2] No such file or directory: /spkr_diarization/code/out_dir_2//test_c9_15461_viterbi.txt'

veterbi file isn't being created as the following command in gmm.adapt() fails with sgmestim error -- no training data available

/speaker_diarization/audioseg-1.2.2/src//sgmestim --map=0.5 --update=wmv --label=C2 --output=/speaker_diarization/spkr_diarization/code/out_dir_2/C2 --file-list=/speaker_diarization/spkr_diarization/code/out_dir_2/adapt.script /speaker_diarization/spkr_diarization/code/out_dir/UBM_0

idnavid commented 7 years ago

Sorry to here is causing you some trouble. Like I said, this is for a very particular platform. Not sure if you can use this. Better try to stick to the bash script in test_audioseg.

mkingupta commented 7 years ago

oh k i will try using bash .. btw is there any update regarding the i-vector based speaker diarization implementation you mentioned before.

idnavid commented 7 years ago

No, I wouldn't count on that for the near future. Right now it's not my priority to publish that. Try looking into the latest version of sidekit. Not sure if they've added i-vectors. But ours won't be out (at least this year).

mkingupta commented 7 years ago

Actually, I have already trained the I-vector and PLDA models in Kaldi I was facing some problems in segmentation and clustering part in case of shorter speech segments, any chance you can help me in that area ..??

idnavid commented 7 years ago

Unfortunately I'm very busy at the moment. good luck with your project.

mkingupta commented 7 years ago

ok.. I understand .. thanks anyways your Implementations helped me a lot :+1: