Closed Privolin closed 7 years ago
Hello,
Yes, the output from PRAAT can be easily converted to a .scp file. You just divide the time information by frame rate based on your HTK configuration you should get the frame number.
The format for scp file is as follows: each line is a segment in the following format
segment_id=featurefile[startFrameNumber,endFrameNumber]
For example, if you have a file called sample.wav with two segments, one from 0 to 100 frames and another from 200 to 300 frames, the scp file can be formatted as:
sample_0_100=sample.fea[0,100] sample_200_300=sample.fea[200,300]
Thanks, Srikanth
Thanks, I have tried this out, however I still get errors such as : ... ... Attempting memory allocation. Memory successfully allocated. Reading file ... Warning frame_val is nan for idx = 1091 and d_index == 2 f_index = 1091
I have checked the audio file and feature file for nan and inf values, there are none that I found. Is the something else that I could be misunderstanding ?
It could mean either that the feature files have NaNs (or infs) or the frame numbers don't exist, that is there are less than 1091 features.
Srikanth
There are over 70k frames in the feature file that I am attempting to use. Do you think its possible for you to take a quick look at the files I am using ?
Yes I could look at it. Perhaps, we can start with just the first few lines of the scp file. Also, if you could share the output of:
HList -r featurefile.fea | wc -l
where featurefile.fea is the HTK feature file.
As Requested
root@4f1add5ba3c4:~/diarization/idiap_diarizer/IBDiarization# HList -r test_data/data/9_Mei_2011_09h00.mfc | wc -l
77889
Also, Some of the header information for the files
root@4f1add5ba3c4:~/diarization/idiap_diarizer/IBDiarization# HList -C test_data/data/config -o -h -t -s 1015 -e 1020 test_data/data/9_Mei_2011_09h00.wav
--------------------- Source: test_data/data/9_Mei_2011_09h00.wav ----------------------
Sample Bytes: 2 Sample Kind: WAVEFORM
Num Comps: 1 Sample Period: 62.5 us
Num Samples: 12462592 File Format: WAV
---------------------------------------- Target ----------------------------------------
Sample Bytes: 156 Sample Kind: MFCC_D_A_Z_0
Num Comps: 39 Sample Period: 10000.0 us
Num Samples: 77889 File Format: HTK
-------------------------------- Observation Structure ---------------------------------
x: MFCC-1 MFCC-2 MFCC-3 MFCC-4 MFCC-5 MFCC-6 MFCC-7 MFCC-8 MFCC-9 MFCC-10
MFCC-11 MFCC-12 C0 Del-1 Del-2 Del-3 Del-4 Del-5 Del-6 Del-7
Del-8 Del-9 Del-10 Del-11 Del-12 DelC0 Acc-1 Acc-2 Acc-3 Acc-4
Acc-5 Acc-6 Acc-7 Acc-8 Acc-9 Acc-10 Acc-11 Acc-12 AccC0
--------------------------------- Samples: 1015->1020 ----------------------------------
And the .scp file
../9_Mei_2011_09h00_1064_1077=../9_Mei_2011_09h00.fea[1064,1077]
../9_Mei_2011_09h00_1090_1101=../9_Mei_2011_09h00.fea[1090,1101]
../9_Mei_2011_09h00_1118_1159=../9_Mei_2011_09h00.fea[1118,1159]
../9_Mei_2011_09h00_1164_1184=../9_Mei_2011_09h00.fea[1164,1184]
../9_Mei_2011_09h00_1188_1199=../9_Mei_2011_09h00.fea[1188,1199]
../9_Mei_2011_09h00_1216_1234=../9_Mei_2011_09h00.fea[1216,1234]
../9_Mei_2011_09h00_1242_1252=../9_Mei_2011_09h00.fea[1242,1252]
../9_Mei_2011_09h00_1254_1293=../9_Mei_2011_09h00.fea[1254,1293]
../9_Mei_2011_09h00_1308_1318=../9_Mei_2011_09h00.fea[1308,1318]
../9_Mei_2011_09h00_1337_1382=../9_Mei_2011_09h00.fea[1337,1382]
../9_Mei_2011_09h00_1388_1400=../9_Mei_2011_09h00.fea[1388,1400]
../9_Mei_2011_09h00_1405_1415=../9_Mei_2011_09h00.fea[1405,1415]
../9_Mei_2011_09h00_1426_1465=../9_Mei_2011_09h00.fea[1426,1465]
I am using just a small segment of the original scp file I created.
I tried modifying the features extracted in the HTK config, for this test file, and does seem to run fine now, without any errors and produces the rttm file. However in the .out file I do get this,
last speech frame = 9993 total number of speech frames = 4998
training the rkl hmm
Warning: Mean is inf!
Warning: Mean is inf!
Warning: Mean is inf!
Warning: Mean is inf!
The warning is thrown 128 times. Is this normal or could there be issues with the segments I am creating ?
Thanks, Priv
The warnings suggest that some of the states are not well modelled. This happens when some of the segments do not have enough features, which is expected. The segmentation will ignore it and prints them as warning.
This will be a problem only when none of segments are long enough.
Thanks, Srikanth
Ahhhh, ok. What is minimum and maximum segment length ?
Minimum is 1 frame. Maximum is what you set in the config file. By default the maximum length is 250 frames.
Awesome! Thanks for the help !
Hi,
I am attempting to use the speaker diarization toolkit, however I do not fully understand the process to create the .scp segments file.
In the previous issues, I read that you guys use VAD to generate the segments. I am currently using PRAAT for VAD and HTK to generate mfccs. So the segments I create using PRAAT are time based (I can get the sample information easily).
I assume that I need to generate segements using the time information from PRAAT that correspond to the frames in the feature file ?