idiap / IBDiarization

C++ Implementation of the Information Bottleneck System
GNU General Public License v3.0
23 stars 11 forks source link

Segments File Error #8

Closed Privolin closed 7 years ago

Privolin commented 7 years ago

Hi,

I am attempting to use the speaker diarization toolkit, however I do not fully understand the process to create the .scp segments file.

In the previous issues, I read that you guys use VAD to generate the segments. I am currently using PRAAT for VAD and HTK to generate mfccs. So the segments I create using PRAAT are time based (I can get the sample information easily).

I assume that I need to generate segements using the time information from PRAAT that correspond to the frames in the feature file ?

mrsrikanth commented 7 years ago

Hello,

Yes, the output from PRAAT can be easily converted to a .scp file. You just divide the time information by frame rate based on your HTK configuration you should get the frame number.

The format for scp file is as follows: each line is a segment in the following format

segment_id=featurefile[startFrameNumber,endFrameNumber]

For example, if you have a file called sample.wav with two segments, one from 0 to 100 frames and another from 200 to 300 frames, the scp file can be formatted as:

sample_0_100=sample.fea[0,100] sample_200_300=sample.fea[200,300]

Thanks, Srikanth

Privolin commented 7 years ago

Thanks, I have tried this out, however I still get errors such as : ... ... Attempting memory allocation. Memory successfully allocated. Reading file ... Warning frame_val is nan for idx = 1091 and d_index == 2 f_index = 1091

I have checked the audio file and feature file for nan and inf values, there are none that I found. Is the something else that I could be misunderstanding ?

mrsrikanth commented 7 years ago

It could mean either that the feature files have NaNs (or infs) or the frame numbers don't exist, that is there are less than 1091 features.

Srikanth

Privolin commented 7 years ago

There are over 70k frames in the feature file that I am attempting to use. Do you think its possible for you to take a quick look at the files I am using ?

mrsrikanth commented 7 years ago

Yes I could look at it. Perhaps, we can start with just the first few lines of the scp file. Also, if you could share the output of:

HList -r featurefile.fea | wc -l

where featurefile.fea is the HTK feature file.

Privolin commented 7 years ago

As Requested

root@4f1add5ba3c4:~/diarization/idiap_diarizer/IBDiarization# HList -r test_data/data/9_Mei_2011_09h00.mfc | wc -l
77889

Also, Some of the header information for the files

root@4f1add5ba3c4:~/diarization/idiap_diarizer/IBDiarization# HList -C test_data/data/config -o -h -t  -s 1015 -e 1020 test_data/data/9_Mei_2011_09h00.wav 
--------------------- Source: test_data/data/9_Mei_2011_09h00.wav ----------------------
  Sample Bytes:  2        Sample Kind:   WAVEFORM
  Num Comps:     1        Sample Period: 62.5 us
  Num Samples:   12462592  File Format:   WAV
---------------------------------------- Target ----------------------------------------
  Sample Bytes:  156      Sample Kind:   MFCC_D_A_Z_0
  Num Comps:     39       Sample Period: 10000.0 us
  Num Samples:   77889    File Format:   HTK
-------------------------------- Observation Structure ---------------------------------
x:      MFCC-1  MFCC-2  MFCC-3  MFCC-4  MFCC-5  MFCC-6  MFCC-7  MFCC-8  MFCC-9 MFCC-10
       MFCC-11 MFCC-12      C0   Del-1   Del-2   Del-3   Del-4   Del-5   Del-6   Del-7
         Del-8   Del-9  Del-10  Del-11  Del-12   DelC0   Acc-1   Acc-2   Acc-3   Acc-4
         Acc-5   Acc-6   Acc-7   Acc-8   Acc-9  Acc-10  Acc-11  Acc-12   AccC0
--------------------------------- Samples: 1015->1020 ----------------------------------

And the .scp file

../9_Mei_2011_09h00_1064_1077=../9_Mei_2011_09h00.fea[1064,1077]
../9_Mei_2011_09h00_1090_1101=../9_Mei_2011_09h00.fea[1090,1101]
../9_Mei_2011_09h00_1118_1159=../9_Mei_2011_09h00.fea[1118,1159]
../9_Mei_2011_09h00_1164_1184=../9_Mei_2011_09h00.fea[1164,1184]
../9_Mei_2011_09h00_1188_1199=../9_Mei_2011_09h00.fea[1188,1199]
../9_Mei_2011_09h00_1216_1234=../9_Mei_2011_09h00.fea[1216,1234]
../9_Mei_2011_09h00_1242_1252=../9_Mei_2011_09h00.fea[1242,1252]
../9_Mei_2011_09h00_1254_1293=../9_Mei_2011_09h00.fea[1254,1293]
../9_Mei_2011_09h00_1308_1318=../9_Mei_2011_09h00.fea[1308,1318]
../9_Mei_2011_09h00_1337_1382=../9_Mei_2011_09h00.fea[1337,1382]
../9_Mei_2011_09h00_1388_1400=../9_Mei_2011_09h00.fea[1388,1400]
../9_Mei_2011_09h00_1405_1415=../9_Mei_2011_09h00.fea[1405,1415]
../9_Mei_2011_09h00_1426_1465=../9_Mei_2011_09h00.fea[1426,1465]

I am using just a small segment of the original scp file I created.

I tried modifying the features extracted in the HTK config, for this test file, and does seem to run fine now, without any errors and produces the rttm file. However in the .out file I do get this,

last speech frame = 9993 total number of speech frames = 4998
training the rkl hmm 
Warning: Mean is inf!
Warning: Mean is inf!
Warning: Mean is inf!
Warning: Mean is inf!

The warning is thrown 128 times. Is this normal or could there be issues with the segments I am creating ?

Thanks, Priv

mrsrikanth commented 7 years ago

The warnings suggest that some of the states are not well modelled. This happens when some of the segments do not have enough features, which is expected. The segmentation will ignore it and prints them as warning.

This will be a problem only when none of segments are long enough.

Thanks, Srikanth

Privolin commented 7 years ago

Ahhhh, ok. What is minimum and maximum segment length ?

mrsrikanth commented 7 years ago

Minimum is 1 frame. Maximum is what you set in the config file. By default the maximum length is 250 frames.

Privolin commented 7 years ago

Awesome! Thanks for the help !