idiap / IBDiarization

C++ Implementation of the Information Bottleneck System
GNU General Public License v3.0
23 stars 11 forks source link

help needed with scp file #1

Closed udaynag closed 7 years ago

udaynag commented 8 years ago

I am under the impression IB diarization includes segmentation as one of the initial steps. but , it looks like this toolkit requires segments to be defined in scp file. Is this a limitation of this toolkit or am I missing anything ? How do we get the initial segment boundaries for recorded data. Thanks !

mrsrikanth commented 8 years ago

The scp file contains speech/non-speech information. We assume that voice activity detection has been applied already prior to diarization. You could use a simple energy-based detector (like something available in Kaldi) or Shout.

Let me know if you need more information.

Thanks, Srikanth

udaynag commented 7 years ago

Thanks for the feedback and a prompt response.

Regards, Uday

udaynag commented 7 years ago

Hi Srikanth,

Using the toolkit, I am not able to get past the initial file read for all the audio files I am using, My output file shows the following warning at the end. Any idea why this would happen. I am using beamformed output from AMI corpus. Thanks in advance -

Reading and Processing the scp file number of segments inside = 482 Number of vectors = 118855 Feat file data/mfcc/EN2001a_30m.fea of type 0 Reading feature file data/mfcc/EN2001a_30m.fea frame dim: 6 num_vec = 118855 Attempting memory allocation. Memory successully allocated. Reading file ... Warning frame_val is nan for idx = 1775 and d_index == 4 f_index = 1775

mrsrikanth commented 7 years ago

Hello,

It means the feature itself has a NaN value. Check 1776th feature vector, 5th dimension in EN2001a_30m.fea.