This README file explains, in brief, the commands to compile and run the IB diarization toolkit.
The toolkit has been tested to run on linux environments. There are three prerequisites to the package
$> cd src/diarization/cmake/
$> cmake .
$> # make clean if cmake has already been run once
$> make
$> # cd to $IB_DIARIZATION_HOME
$> bash scripts/run.diarizeme.sh ipfile scpfile opfolder fileid [betaval]
Example command
$> bash scripts/run.diarizeme.sh data/mfcc/AMI_20050204-1206.fea data/scp/AMI_20050204-1206.scp result.dir/ AMI_20050204-1206
The files required to run the script are as follows:
To test the result use md-perl-eval tool available on the NIST website
$> perl md-eval-v21.pl -m -afc -c 0.25 -r data/rttm/AMI_20050204-1206.rttm -s result.dir/AMI_20050204-1206.rttm
The expected DER is 8.79%
To add TDOA features (or other complementary features), use the following script
$>bash scripts/run.diarizeme.tdoa.sh data/mfcc/AMI_20050204-1206.fea 0.8 data/tdoa/AMI_20050204-1206.fea 0.2 data/scp/AMI_20050204-1206.scp result.dir.tdoa AMI_20050204-1206
The expected DER is 7.12%
The diarization engine can also be accessed as a linux command. The relevant binary is src/diarization/cmake/diarizeme. To run the command instead of the script use the following command
$> src/diarization/cmake/diarizeme \
--mfcc data/mfcc/AMI_20050204-1206.fea 1.0 \
--recid AMI_20050204-1206 \
--outdir result.dir \
--tmpdir result.dir \
-s data/scp/AMI_20050204-1206.scp \
--beta 10 \
--nthread 1
In the above command, the MFCC features are given a weight of 1.0. The tmpdir option is also pointed to result.dir directory. However, it can be different from the outdir option as well.
The beta value, which is the Lagrangian parameter used during IB clustering, is optimized for the AMI corpus. This may have to be optimized for other datasets.
The nthread option sets the number of threads to be used for clustering.
To add other features along with MFCCs simply use the --other option. A weight along with the file name will have to be supplied. The sum of weights for all features should sum to 1.0. For example, to use TDOA features along with MFCC features run
$> src/diarization/cmake/diarizeme \
--mfcc data/mfcc/AMI_20050204-1206.fea 0.8 \
--tdoa data/tdoa/AMI_20050204-1206.fea 0.2 \
--recid AMI_20050204-1206 \
--outdir result.dir \
--tmpdir result.dir \
-s data/scp/AMI_20050204-1206.scp \
--beta 10 \
--nthread 1
The pydiarization subproject provides high-level API around the Diarization toolkit.