Closed zhangjiulong closed 8 years ago
I runned decode_ctc_lat.sh several times and the log in build/trans/zhangjl_003/eesen/decode dir shows the results is the same but runned speech2text.sh the log in build/trans/zhangjl_003/eesen/decode dir is different every time.
Could you have a look at the diarization (segmentation) file and compare whether it is always the same, or different? For your example, it would be:
build/diarization/zhangjl_003/show.s.seg
or whatever file is specified by the SEGMENTS variable in /vagrant/Makefile.options
Inconsistent segmentation would produce different results. I have not tested to see whether the LIUM segmentation code produces exactly the same segmentation for the same audio every time. You are right to note that any inconsistency in results seems unusual, for the same input.
I test the cmd steps/decode_ctc_lat.sh severial time, the result is the same, but if I runned speech2text then the result will be different.
This experiment (running decode_ctc_lat.sh) seems to verify that the decoding stage of processing is consistent.
It would help us diagnose things if you could post examples or snippets of 2 (or more) differing log files (decode.1.log) that were produced for the same input.
On 07/01/2016 03:46 AM, john wrote:
I runned decode_ctc_lat.sh only several times and the log in build/trans/zhangjl_003 build/trans/zhangjl_003/eesen/decode dir shows the results is the same but runned speech2text.sh the log in build/trans/zhangjl_003 build/trans/zhangjl_003/eesen/decode dir is different every time.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/12#issuecomment-229879820, or mute the thread https://github.com/notifications/unsubscribe/ACX11l_sqBPGBKYdRk7MQGL8qvXZplhaks5qRMXogaJpZM4JC2jA.
Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.
Hi My SEGMENTS=show.i.seg and the same wav with running twice speech2text.sh build result is in attachment. two_builds_result.tar.gz
I found 8k16bit result is the same.
The best I can tell, even though the segmentations are identical, the segmented WAV files are NOT identical, therefore the fbank features are not identical, leading to different results.
er1k@islpc22:~/twobuilds$ md5sum /audio/segmented/_/*.wav 62b51025624a9ac3177d94595cf423db build_01/audio/segmented/zhangjl_003/zhangjl_003_0000.000-0004.540_1.wav 03967f33f656f6af62054678644af7bb build_02/audio/segmented/zhangjl_003/zhangjl_003_0000.000-0004.540_1.wav
It would seem the outputs of the sox command (for the same input) are slightly different. Perhaps because of the algorithm it uses for normalization(?)
sox build/audio/base/$_.wav --norm $@/$_$${timeformatted}$${sp_id}.wav trim $$start $$len
(See the sox bug here:https://sourceforge.net/p/sox/bugs/258/ https://sourceforge.net/p/sox/bugs/258/)
Very interesting discovery!
On 07/03/2016 10:51 PM, john wrote:
Hi My SEGMENTS=show.i.seg and the same wav with running twice speech2text.sh build result is in attachment. two_builds_result.tar.gz https://github.com/srvk/eesen-transcriber/files/345450/two_builds_result.tar.gz
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/srvk/eesen-transcriber/issues/12#issuecomment-230195088, or mute the thread https://github.com/notifications/unsubscribe/ACX11iETbvR5EPq46v0ZAVfYl4vz3yWAks5qSHUqgaJpZM4JC2jA.
Eric Riebling Interactive Systems Lab er1k@cs.cmu.edu 407 South Craig St.
Hi I have trained a 8k 8bit model, but when I test the mode using I recored wav file named 001.wav, I got several different recognition result with the same wav file. I want to know how the result happened and what is the reason? thanks very much.