opensource-spraakherkenning-nl / Kaldi_NL

Code related to the Dutch instance and user groups of the KALDI speech recognition toolkit
http://www.opensource-spraakherkenning.nl
Apache License 2.0
64 stars 16 forks source link

lium fails to find speech segments #4

Closed wmelder closed 7 years ago

wmelder commented 7 years ago

The lium speaker diarization results in 0 segments of speech, which is caused probably by a java exception.

Can it be that the Kaldi_NL configuration depends on another Java distribution than I have installed, which is: java version "1.8.0_111"?

This is from the liumlog segmentation:


02:34.648                CONFIG| cmdLine: --fInputDesc=sphinx,1:3:2:0:0:0,13,0:0:0:0 --fInputMask=../OUT//intermediate/data/ALL/liumlog/%s.mfcc --sInputMask=../OUT//intermediate/data/ALL/liumlog/%s.i.seg --sOutputMask=../OUT//intermediate/data/ALL/liumlog/TONY_VAN_VERR-AEN560690VB.pms.seg --dPenality=10,10,50 --tInputMask=lib/models_es/sms.gmms TONY_VAN_VERR-AEN560690VB
02:34.678 MDecode        INFO  | fast decoding, Number of GMM=3 {make() / 1}
02:34.683 MDecode        FINE  |     decoder.get result {make() / 1}
Exception in thread "main" java.lang.NullPointerException
    at fr.lium.spkDiarization.libDecoder.FastDecoderWithDuration.getClusterSet(FastDecoderWithDuration.java:872)
    at fr.lium.spkDiarization.programs.MDecode.make(MDecode.java:93)
    at fr.lium.spkDiarization.programs.MDecode.main(MDecode.java:121)
wmelder commented 7 years ago

Don't bother trying to install lower java version: still the same NullPointerexception :-( I now have this java version:

java version "1.7.0_91" OpenJDK Runtime Environment (rhel-2.6.2.3.el7-x86_64 u91-b00) OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)

wmelder commented 7 years ago

How come that the .local/diarization.sh script runs properly? Must be some misconfiguration from flist2scp.sh, I guess..?

wmelder commented 7 years ago

The flist2scp.sh contains some code that calls the local/diarization.sh script with parameter $uemopt. It Th value of the variable assignment seems to come from a file:

cat $1/*.uem 2>&1 | sort >$1/ALL/test.uem
[ -s $1/ALL/test.uem ] && uemopt="--uem $1/ALL/test.uem"

In liumlog files the parameter looks like this: --uem /home/asr/OUT//intermediate/data/ALL/test.uem

Now if I do this: cat OUT/intermediate/data/ALL/test.uem it gives me this: cat: /home/asr/OUT//intermediate/data/*.uem: No such file or directory

Can it be that the script crashes because these files are missing?

wmelder commented 7 years ago

Now I noticed a source commit in local/flist2scp.sh. After git pull and running the decode.sh, it seems that asr is running! This is the change: +[ -e $1/*.glm ] && cat $1/*.glm 2>&1 >$1/ALL/all.glm +[ -e $1/*.uem ] && cat $1/*.uem 2>&1 | sort >$1/ALL/test.uem

I'm not sure what it means, but at least the exception is gone for now. Keep you posted with results.

wmelder commented 7 years ago

The process was running for more than realtime, so I ended the process. The problem now seems to be that local/decode_prepdata.sh script prepares a test filelist and that the links to the files are not correct. Here's some output on my server:

[ data]$ cat test.flist ../OUT2/intermediate/data/TONY_VAN_VERR-AEN560690VD.wav ../OUT2/intermediate/data/TONY_VAN_VERR-AEN560690VE.wav ../OUT2/intermediate/data/TONY_VAN_VERR-AEN560690VB.wav ../OUT2/intermediate/data/TONY_VAN_VERR-AEN560690VC.wav [ data]$ ls -al ../OUT2/intermediate/data/TONY_VAN_VERR-AEN560690VD.wav ls: cannot access ../OUT2/intermediate/data/TONY_VAN_VERR-AEN560690VD.wav: No such file or directory If these links are not correct I cannot see how further processing could work.

wmelder commented 7 years ago

Close this one. The last issue was resolved by using filenames including absolute filepaths.