srvk / eesen-transcriber

EESEN based offline transcriber VM using models trained on TEDLIUM and Cantab Research
Apache License 2.0
49 stars 14 forks source link

Questions regarding eesen vs regular stuff #9

Closed vince62s closed 8 years ago

vince62s commented 8 years ago

Hi guys, I have been working on the Tanel's offline transcriber with the regular decoding from Kaldi's model, but I adapted it to english, made somes changes to avoid audio split and stuff like that (adding new words, LM merging, ...) Can you just tell me what the eesen brings ? is it faster ? Does it allow different features ? thanks for your feedback. V.

riebling commented 8 years ago

Eesen uses only nnet decoding - a model trained on TEDLIUM data, with filterbank & CMVN features. It supports selection of different segmentations produced by the LIUM segmenter, and there is a way to train your own language model, but it sounds like you are already doing that.

One reason for using EESEN is that it is significantly simpler to train than Kaldi-tedlium, however it does still require GPU hardware.

It's not much faster, a lot of time gets taken up by the segmentation steps (though I'm thinking we can possibly shave off a couple of those) but again, maybe you're doing that already. ("avoid audio split")

On 04/25/2016 02:57 PM, vince62s wrote:

Hi guys, I have been working on the Tanel's offline transcriber with the regular decoding from Kaldi's model, but I adapted it to english, made somes changes to avoid audio split and stuff like that (adding new words, LM merging, ...) Can you just tell me what the eesen brings ? is it faster ? Does it allow different features ? thanks for your feedback. V.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/srvk/eesen-transcriber/issues/9

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

vince62s commented 8 years ago

EESEN requires GPU for both training AND decoding or just training ? if yes I'll stick to the regular way :) I have a GPU GTX980 Ti but reserve this for training tasks ...

On the segmentation task, it's a pain, because this is not multi threaded .... there should be another way to do this.

On another note, does EESEN do confidence scoring ?

riebling commented 8 years ago

Just training. Running the nnet decode in forward direction runs fine on CPU :)

There are per-word confidence scores produced in the .ctm files, so yes to that question.

On 04/25/2016 04:23 PM, vince62s wrote:

EESEN requires GPU for both training AND decoding or just training ? if yes I'll stick to the regular way :) I have a GPU GTX980 Ti but reserve this for training tasks ...

On the segmentation task, it's a pain, because this is not multi threaded .... there should be another way to do this.

On another note, does EESEN do confidence scoring ?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/srvk/eesen-transcriber/issues/9#issuecomment-214509083

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

vince62s commented 8 years ago

got a question for you .... You seem to have schosen LMSCALE=8 "by experiment" When looking at the initial SETUP from Tanel it is 17 and looking at best results for Tedlium recipe it seems to be 10 or 11. is there any way to get the best estimate accross the different output according to the various LMSCALE values ?

riebling commented 8 years ago

It varies based on testing and training data (which control the model). So for English trained on TEDLIUM data, the word error rate is measured for each of various LMSCALE values, and the one which produced the lowest WER gets chosen. For English 8khz trained on Switchboard data, the value is different, the data and models are different. (but only off by 1)

If you have test data including ground truth data (a "perfect" human transcription) and it was in the right format ( STM file format, for example) you could run the system with the _runscored.sh script and it will produce standard sclite scoring output across a range of LMSCALE values, and you may find that a different value gets best results for your data... so it really depends on how you go about measuring, which is a long-winded explanation of what was meant by "by experiment"

Hope this helps

On 04/27/2016 10:21 AM, vince62s wrote:

got a question for you .... You seem to have schosen LMSCALE=8 "by experiment" When looking at the initial SETUP from Tanel it is 17 and looking at best results for Tedlium recipe it seems to be 10 or 11. is there any way to get the best estimate accross the different output according to the various LMSCALE values ?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/srvk/eesen-transcriber/issues/9#issuecomment-215099117

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

vince62s commented 8 years ago

Yes it's pretty much what I had in mind but, I don't see the makefile anymore in this rep but I thought I had seen LMSCALE=8 with the tedlium model, maybe that was for your switchboard model.

the only thing is that based on confidence score it should be possible to choose the best output for new audio (for which by essence, you don't have the exact stm)

riebling commented 8 years ago

Right, STM is just a formalized way of getting numeric measurements, you can't for "unseen" audio. The scale factor may have been moved from Makefile into /vagrant/Makefile.options (shared folder on the host) to expose it and make it easier to edit from outside of the VM

On 04/27/2016 10:47 AM, vince62s wrote:

Yes it's pretty much what I had in mind but, I don't see the makefile anymore in this rep but I thought I had seen LMSCALE=8 with the tedlium model, maybe that was for your switchboard model.

the only thing is that based on confidence score it should be possible to choose the best output for new audio (for which by essence, you don't have the exact stm)

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/srvk/eesen-transcriber/issues/9#issuecomment-215107288

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.