Open proycon opened 5 years ago
The CTM file is actually entirely empty, which is obviously wrong. The ctm2xml converter stumbles over it, but that's secondary.
This has nothing to do with the ctm2xml conversion. ASR engine fails at some point. I need to see the log files.
Here's the CLAM full log of my test run: error.log.
And here's the scratch dir: http://lst.science.ru.nl/~proycon/scratchtest.tar.gz
Ok, the exact same error now turns up for oral_history as well, despite that not having changed, so the problem isn't even eng_ASR as such. Something may have gone wrong in the underlying kaldi installation. I'll force another update with recompilation of kaldi.
I recompiled kaldi but the issue persists.
I'm following a lead now, from two of the logs:
ompute-mfcc-feats --verbose=2 --config=/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/mfcc_hires.conf ark:- ark:- | copy-feats --compress=true ark:- ark,scp:/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/mfcc/raw_mfcc_ALL.1.ark,/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/mfcc/raw_mfcc_ALL.1.scp # Started at Thu May 2 14:12:15 CEST 2019
#
bash: line 1: extract-segments: command not found
bash: line 1: compute-mfcc-feats: command not found
bash: line 1: copy-feats: command not found
# online2-wav-nnet3-latgen-faster --do-endpointing=false --frames-per-chunk=20 --extra-left-context-initial=0 --online=true --frame-subsampling-factor=3 --config=/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/tmp/conf/online.
conf --min-active=200 --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=models/AM/online/graph/words.txt /var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/tmp/final.mdl models/AM/online/graph/HCLG.fst ark:/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/data/ALL/split1/1/spk2utt "ark,s,cs:extract-segments scp,p:/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/data/ALL/split1/1/wav.scp /var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/intermediate/data/ALL/split1/1/segments ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >/var/www/webservices-lst/live/writable/eng_ASR/scratch//test_Zx1fXQFrNMFKon5/tmp/tmp.FNFnIGfsME/lat.1.gz" # Started at Thu May 2 14:12:15 CEST 2019
#
bash: line 1: online2-wav-nnet3-latgen-faster: command not found
# Accounting: time=0 threads=1
# Ended (code 127) at Thu May 2 14:12:15 CEST 2019, elapsed time 0 seconds
Next question is what provides these programs and why can't they be found?
I just sent an email.
Ah great, I see, we found the problem at the same time then :) I'll investigate why they're not in $PATH
Awesome! :)
Ok, it seems they were never in $PATH in LaMachine so I now wonder why it used to work before. Also, kaldi is a bit chaotic as they have a whole bunch of bin/
dirs and it doesn't have a proper installation script:
kaldi/src (weblamachine) $ ls -d *bin/
bin/ chainbin/ featbin/ fgmmbin/ fstbin/ gmmbin/ ivectorbin/ kwsbin/ latbin/ lmbin/ nnet2bin/ nnet3bin/ nnetbin/ online2bin/ onlinebin/ rnnlmbin/ sgmm2bin/ tfrnnlmbin/
Now I could of course simply add all of these to $PATH (I'm assume there is no conflict in names then). Shall we do that or do you have a better suggestion? (I know there are some local env.sh and path.sh scripts in the resources that perhaps assume this role?)
Ah, I found the problem in your path.sh (I assume this gets executed?). Here you set the environment but you use an absolute path, which you can't do, you'll have to let LaMachine set KALDI_ROOT and not overwrite it.
export KALDI_ROOT=/home/eyilmaz/main/kaldi
[ -f $KALDI_ROOT/tools/env.sh ] && . $KALDI_ROOT/tools/env.sh
export PATH=$PWD/utils/:$KALDI_ROOT/src/bin:$KALDI_ROOT/tools/openfst/bin:$KALDI_ROOT/tools/sctk/bin:$KALDI_ROOT/src/fstbin/:$KALDI_ROOT/src/gmmbin/:$KALDI_ROOT/src/featbin/:$KALDI_ROOT/src/lm/:$KALDI_ROOT/src/sgmmbin/:$KALDI_ROOT/src/sgmm2bin/:$KALDI_ROOT/src/fgmmbin/:$KALDI_ROOT/src/latbin/:$KALDI_ROOT/src/nnetbin:$KALDI_ROOT/src/nnet2bin/:$KALDI_ROOT/src/kwsbin:$KALDI_ROOT/src/online2bin/:$KALDI_ROOT/src/ivectorbin/:$KALDI_ROOT/src/lmbin/:$KALDI_ROOT/src/nnet3bin/:$PWD:$PATH
export LC_ALL=C
The oral history webservice doesn't have that $PATH problem but actually fails on something else:
ERROR (online2-wav-nnet3-latgen-faster[5.5.221~1-19721]:ReadConfigFile():parse-options.cc:469) Cannot open config file: /vol/customopt/kaldi/egs/Kaldi_NL/Models/NL/UTwente/HMI/AM/CGN_all/nnet3_online/tdnn/v1.0/conf/mfcc.conf
The hard link to /vol/customopt/kaldi/
is the problem there, but that also explains why it fails, as I moved that dir away as I was under the impression we all use the LaMachine kaldi now. I'll move it back and that hopefully patches oral_history for the time being.
The new LaMachine with the webservice is deployed on ponyland now. I recorded and submitted a wave file to test it, and the service runs but produces an error: