Closed wanghaisheng closed 8 years ago
this time i use your test data testdata3877/mincho and try to reproduce your result 3700 chars( little bigger tesseract jp-dataset ) with nhidden = 800 ,when it came to 60000 iteration, it still got nothing.could you share me your iteration-error rate curve
I didn't keep record of it. But as long as I remember, 60000 iterations are not enough. Around 100000, it will start to read Chinese. Keep on!!!!!
that s a good news , i am tranning it using clstm and umaru by the way ,can you upload your trained model again it seems broken. and have you tested this trained model with other pic from real life
a little worried updated 2016-06-21
ERROR 145200 0.932214 1499 1608 saving best performing network so far japanese-3877.clstm error rate: 0.932214
......
224800
TRU ゎ⓲''廃シ雰正寒某彬難茄巻佛縄メ係岳酔搬萄並巣靖プ悼涌⑭随戒
ALN ゎ⓲⓲⓲''廃シ雰正寒某彬難茄巻佛縄メ係岳酔搬萄並巣靖プ悼涌⑭随戒
OUT ❼❼還<雰正某難帰巻柳組慌香査極並収購フ料剖b異
steptime 3.27526
ERROR 224800 1.35902 1499 1103
....
291398
TRU 値樫禍券沸怪姫褐癬潜ぁげ除テ心贖PQ慢⑤億膨院権婿ナ彗掛樓辞
ALN 値樫禍券沸怪姫褐癬癬潜ぁげ除テ心贖PQ慢⑤億膨院権婿ナ彗掛樓辞
OUT 値操禍券沸怪姫褐潜ぁげ除テ心噴PQ慢⑤億膨院権婿チ彗排穫辞
steptime 3.63483
291399
TRU 碧浪戯受養g潰ぁ饗元私替拳草邦遥歓惧皇他浅危双志皓行填偏頃ジ
ALN 碧浪戯受養g潰ぁ饗元私替拳草邦遥歓惧皇他浅危双志皓行填偏頃ジ
OUT 皆浪戯受養g演zぁ饗元私替拳草邦遥歓惧皇他浅危双志皓行填偏頃ジ
steptime 3.89198
291400
TRU 暫郷滞辞荏髭珀壌蕎逝統尽ゲぢ灌勾里貿須客求姑鬼朔『益千后恩匡
ALN 暫郷滞辞荏髭珀壌蕎逝統尽ゲぢ灌勾里貿須客求姑鬼朔『益千后恩匡
OUT 暫郷滞辞在髭珀壌薄逝統尽ゲぢ濯勾里貿須客求貼鬼週『益千后恩医
steptime 3.77421
ERROR 291400 4.23446 1499 354
343798
TRU 終中豪辣無ヲ字っ矮自得練免思築敬院麩憲気自鉱住膨恒犯憂仁裂奏
ALN 終中豪辣無ヲ字っ矮自得練免思築敬院麩憲気自鉱住膨恒犯憂仁裂奏
OUT 終中豪辣無ヲ字っ矮肖得練免思築敬院麩憲気四鉱佳膨恒犯憂仁裂奏
steptime 5.43364
343799
TRU 戸ゲ惜鐘0男ヌ倍叱生特諦永励紅洗申波わ〜Oど羞簗研G工蓋蘂保
ALN 戸ゲ惜鐘0男ヌ倍叱生特諦永励紅洗申波わ〜Oど羞簗研G工蓋蘂保
OUT 芦ゲ惜鐘d男ヌ倍叱生特諦永励紅洗申波わ〜Oと羞簗研G工蓋蘂保
steptime 6.50675
343800
TRU 奥歩蝋鉢隕匙脈迫ベサ箸壺迅穴薪昂恐何ザ貨庶劫涛涼率夏璧電瑞援
ALN 奥歩蝋鉢隕匙脈迫ベサ箸壺迅穴薪昂恐何ザ貨庶劫涛涼率夏璧電瑞援
OUT 奥典歩蝋鉢隕匙脈迫ベサ箸蓋壷迅穴薪昂恐何ザ賃庶幼湾涼率夏璧電瑞援
steptime 5.93425
ERROR 343800 5.97211 1499 251
343801
TRU 和賃嶋錫児憂脳刺●任砧筒ゆぐ欣布崗泌薇群窈ゑづ奉暦^満q午郊
ALN 和賃嶋錫児憂脳刺●任砧筒ゆぐ欣布崗泌薇群窈ゑづ奉暦^満q午郊
OUT 和賃嶋錫児憂脳刺●佳砧筒ゆぐ欣布崗泌漫群窃窈さづ奉暦^濱q午郊
steptime 47.3135
343802
TRU ゅ轟計禮済呈肝配圓路曖最⑧痛閥民協扱バ酪痢麹問築持円赴寛侶綴
ALN ゅ轟計禮済呈肝配圓路曖最⑧痛閥民協扱バ酪痢麹問築持円赴寛侶綴
OUT ゆ轟訃禮済呈肝配圓路曖最⑧痛閥民協扱パ酷痢麹問築持団赴寛侶綴
steptime 5.32979
343803
TRU 卒㌧嗅慶雁瑕▼窯②曇侶誘鯵慨怒沃振下涎暴位流市皇学滅喉纂灌税
ALN 卒㌧嗅慶雁瑕▼窯②曇侶誘鯵慨怒沃振下涎暴位流市皇学滅喉纂灌税
OUT 卒㌧嗅慶雁瑕▼窯②曇侶誘鯵慨怒沃振下減暴位流市皇学滅喉纂濯税
steptime 3.96418
343804
TRU 准巨八康孔珍込漢''喫覚略ケ術1没聖穴遜ヂ埋粧畔0要参嘲振撃番
ALN 准巨八康孔珍込漢''喫覚略ケ術1没聖穴遜ヂ埋粧畔0要参嘲振撃番
OUT 准巨八康孔疹込漢''喫寛略ケ術1没聖穴遜ヂ埋粧畔0炭要参嘲振撃番
steptime 4.02296
343805
TRU 塙川暑使ズ疇珊躇菲召鳳彼逃象鼻質グ釜⑩柩内ヾ曜崔$区慎力膝穂
my trainning scripts
#!/bin/bash
##2016-06-03 chongxian riben ren tidao de diedai 15 wan ci shoulian
set -e
debug=${debug:-0}
options=${options:-}
export PS4='
>>>>>>> '
trap "echo TEST FAILED" EXIT
set -x
export seed=0.222
scons -s -c; rm -f *.o *.a
scons -j 4 gpu=0 debug=$debug options="$options" clstmocrtrain clstmfiltertrain clstmfilter clstmocr test-lstm
time ./test-lstm
time ./test-japanese.sh
#time ./test-filter.sh
#time ./test-ocr.sh
scons -s -c; rm -f *.o *.a
scons -j 4 gpu=0 double=1 debug=$debug options="$options" test-cderiv test-deriv test-ctc
#./test-cderiv
#./test-deriv
./test-japanese.sh
rm -f *.pb.h *.pb.cc
scons -c all
#scons -s -c pyswig
#scons pyswig
#python test-lstm.py
#set +x
#scons -s -c all pyswig
trap "echo ALL TESTS PASS" EXIT
./test-japanese.sh
#!/bin/bash
set -ea
find ../clstm-Japanese-model/testdata3877/mincho/ -name '*.bin.png' | sort -r > japanese-3877-char
sed 1,0d japanese-3877-char > japanese-train-3877
sed 1,4950d japanese-3877-char > japanese-test-3877
report_every=1
save_every=1000
ntrain=400000
dewarp=center
display_every=100
test_every=100
display_every=100
testset=japanese-test.h5
hidden=800
lrate=1e-4
save_name=japanese-3877
report_time=1
:
#load=xps-352000.clstm
# gdb --ex run --args \
./clstmocrtrain japanese-train-3877 japanese-test-3877
error rate each 100 iteration 1.txt
I am trying 7000 words( Chinese and some special chararter) with nhidden = 800 as you suggested
right now through 352000 iterations, error rate is still around 1,unlikely to go down, eagerly need your advice