srvk / eesen-transcriber

EESEN based offline transcriber VM using models trained on TEDLIUM and Cantab Research
Apache License 2.0
49 stars 14 forks source link

same wav and same model decode result different #12

Closed zhangjiulong closed 8 years ago

zhangjiulong commented 8 years ago

Hi I have trained a 8k 8bit model, but when I test the mode using I recored wav file named 001.wav, I got several different recognition result with the same wav file. I want to know how the result happened and what is the reason? thanks very much.

zhangjiulong commented 8 years ago

I runned decode_ctc_lat.sh several times and the log in build/trans/zhangjl_003/eesen/decode dir shows the results is the same but runned speech2text.sh the log in build/trans/zhangjl_003/eesen/decode dir is different every time.

riebling commented 8 years ago

Could you have a look at the diarization (segmentation) file and compare whether it is always the same, or different? For your example, it would be:

build/diarization/zhangjl_003/show.s.seg

or whatever file is specified by the SEGMENTS variable in /vagrant/Makefile.options

Inconsistent segmentation would produce different results. I have not tested to see whether the LIUM segmentation code produces exactly the same segmentation for the same audio every time. You are right to note that any inconsistency in results seems unusual, for the same input.

I test the cmd steps/decode_ctc_lat.sh severial time, the result is the same, but if I runned speech2text then the result will be different.

This experiment (running decode_ctc_lat.sh) seems to verify that the decoding stage of processing is consistent.

It would help us diagnose things if you could post examples or snippets of 2 (or more) differing log files (decode.1.log) that were produced for the same input.

On 07/01/2016 03:46 AM, john wrote:

I runned decode_ctc_lat.sh only several times and the log in build/trans/zhangjl_003 build/trans/zhangjl_003/eesen/decode dir shows the results is the same but runned speech2text.sh the log in build/trans/zhangjl_003 build/trans/zhangjl_003/eesen/decode dir is different every time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/12#issuecomment-229879820, or mute the thread https://github.com/notifications/unsubscribe/ACX11l_sqBPGBKYdRk7MQGL8qvXZplhaks5qRMXogaJpZM4JC2jA.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.

zhangjiulong commented 8 years ago

Hi My SEGMENTS=show.i.seg and the same wav with running twice speech2text.sh build result is in attachment. two_builds_result.tar.gz

zhangjiulong commented 8 years ago

I found 8k16bit result is the same.

riebling commented 8 years ago

The best I can tell, even though the segmentations are identical, the segmented WAV files are NOT identical, therefore the fbank features are not identical, leading to different results.

er1k@islpc22:~/twobuilds$ md5sum /audio/segmented/_/*.wav 62b51025624a9ac3177d94595cf423db build_01/audio/segmented/zhangjl_003/zhangjl_003_0000.000-0004.540_1.wav 03967f33f656f6af62054678644af7bb build_02/audio/segmented/zhangjl_003/zhangjl_003_0000.000-0004.540_1.wav

It would seem the outputs of the sox command (for the same input) are slightly different. Perhaps because of the algorithm it uses for normalization(?)

sox build/audio/base/$_.wav --norm $@/$_$${timeformatted}$${sp_id}.wav trim $$start $$len

(See the sox bug here:https://sourceforge.net/p/sox/bugs/258/ https://sourceforge.net/p/sox/bugs/258/)

Very interesting discovery!

On 07/03/2016 10:51 PM, john wrote:

Hi My SEGMENTS=show.i.seg and the same wav with running twice speech2text.sh build result is in attachment. two_builds_result.tar.gz https://github.com/srvk/eesen-transcriber/files/345450/two_builds_result.tar.gz

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/12#issuecomment-230195088, or mute the thread https://github.com/notifications/unsubscribe/ACX11iETbvR5EPq46v0ZAVfYl4vz3yWAks5qSHUqgaJpZM4JC2jA.

Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.