isaomatsunami / clstm-Japanese

Japanese trained data of clstm
15 stars 4 forks source link

error rate does not go down #5

Closed wanghaisheng closed 8 years ago

wanghaisheng commented 8 years ago

I am trying 7000 words( Chinese and some special chararter) with nhidden = 800 as you suggested

right now through 352000 iterations, error rate is still around 1,unlikely to go down, eagerly need your advice

>>>>>>> ./test-xps.sh
#: ntrain = 400000
#: save_name = xps
#: report_time = 10
#: charsep = 
got 7571 files, 72 tests
#: load = xps-352000.clstm
.Stacked: 0.0001 0.9 in 0 48 out 0 5702
.Stacked.Parallel: 0.0001 0.9 in 0 48 out 0 200
.Stacked.Parallel.NPLSTM: 0.0001 0.9 in 0 48 out 0 100
.Stacked.Parallel.Reversed: 0.0001 0.9 in 0 48 out 0 100
.Stacked.Parallel.Reversed.NPLSTM: 0.0001 0.9 in 0 48 out 0 100
.Stacked.SoftmaxLayer: 0.0001 0.9 in 0 200 out 0 5702
#: start = -1
start 352001
#: test_every = 1000
#: save_every = 1000
#: report_every = 100
#: display_every = 100
352001
TRU 篱3273
ALN 篱3273
OUT 3273
steptime 1.13001e-06
ERROR 352001 1.0961     3228 2945
saving best performing network so far xps.clstm error rate:  1.0961
saving xps-352001.clstm
352100
TRU 鹊4021
ALN 鹊4021
OUT '4021
steptime 0.809273
352200
TRU 瘿8108
ALN 瘿8108
OUT '8108
steptime 0.387195
352300
TRU MIU
ALN MIU
OUT 3I1
steptime 0.515013
352400
TRU 屐6976
ALN 屐6976
OUT 6976
steptime 0.740205
352500
TRU 篆5513
ALN 篆5513
OUT 5513
steptime 0.432444
352600
TRU 沧1855
ALN 沧1855
OUT 1855
steptime 0.490426
352700
TRU 恒2667
ALN 恒恒2667
OUT N2667
steptime 0.535477
352800
TRU 缬7151
ALN 缬7151
OUT 7175
steptime 0.595361
352900
TRU 申4174
ALN 申4174
OUT A4174
steptime 0.480022
353000
TRU 乐3254
ALN 乐乐3254
OUT 3254
steptime 1.12941
ERROR 353000 1.09054     3228 2960
saving best performing network so far xps.clstm error rate:  1.09054
saving xps-353000.clstm
353100
TRU ♝ 9821 265D
ALN  9821 265DD
OUT 9821 265
steptime 0.915128
353200
TRU 笫8342
ALN 笫笫8342
OUT S8342
steptime 0.4906
353300
TRU 徨6569
ALN 徨徨6569
OUT 6569
steptime 0.762358
353400
TRU 桨2916
ALN 桨2916
OUT 52916
steptime 1.15563
353500
TRU 敫7524
ALN 敫7524
OUT 7524
steptime 1.31787
353600
TRU 吸4692
ALN 吸4692
OUT 4692
steptime 0.560728
353700
TRU LENG
ALN LENG
OUT IING
steptime 0.972345
353800
TRU 萋6134
ALN 萋6134
OUT 6134
steptime 0.929171
353900
TRU 箩3465
ALN 箩346
OUT 3
steptime 0.543161
354000
TRU 黩8782
ALN 黩822
OUT   
steptime 21.2413
ERROR 354000 1.09871     3228 2938
saving xps-354000.clstm
354100
TRU 符2391
ALN 符符2391
OUT '2391
steptime 0.678183
354200
TRU 咽4942
ALN 咽4942
OUT A4942
steptime 0.539517
354300
TRU 魅8740
ALN 魅8740
OUT 8740
steptime 1.1505
354400
TRU SHAN
ALN HSSHN
OUT N
steptime 0.447151
354500
TRU 录3428
ALN 录录3428
OUT 3428
steptime 0.478029
354600
TRU 蹀8562
ALN 蹀8562
OUT 8562
steptime 0.401212
354700
TRU ☤ 9764 2624
ALN  9764 2624
OUT 49764 2624
steptime 0.803597
354800
TRU HA
ALN HHAA
OUT 3AN
steptime 0.508759
354900
TRU 灿1851
ALN 灿1851
OUT 31851
steptime 0.454391
355000
TRU 苎6049
ALN 苎苎苎苎苎60494
OUT 947
steptime 2.53145
ERROR 355000 1.09202     3228 2956
saving xps-355000.clstm
355100
TRU 享4777
ALN 享44777
OUT 44777
steptime 0.840914
wanghaisheng commented 8 years ago

this time i use your test data testdata3877/mincho and try to reproduce your result 3700 chars( little bigger tesseract jp-dataset ) with nhidden = 800 ,when it came to 60000 iteration, it still got nothing.could you share me your iteration-error rate curve

isaomatsunami commented 8 years ago

I didn't keep record of it. But as long as I remember, 60000 iterations are not enough. Around 100000, it will start to read Chinese. Keep on!!!!!

wanghaisheng commented 8 years ago

that s a good news , i am tranning it using clstm and umaru by the way ,can you upload your trained model again it seems broken. and have you tested this trained model with other pic from real life

wanghaisheng commented 8 years ago

a little worried updated 2016-06-21

ERROR 145200 0.932214     1499 1608 saving best performing network so far japanese-3877.clstm error rate:  0.932214
......

224800
TRU ゎ⓲''廃シ雰正寒某彬難茄巻佛縄メ係岳酔搬萄並巣靖プ悼涌⑭随戒
ALN ゎ⓲⓲⓲''廃シ雰正寒某彬難茄巻佛縄メ係岳酔搬萄並巣靖プ悼涌⑭随戒
OUT ❼❼還<雰正某難帰巻柳組慌香査極並収購フ料剖b異
steptime 3.27526
ERROR 224800 1.35902     1499 1103
....

291398
TRU 値樫禍券沸怪姫褐癬潜ぁげ除テ心贖PQ慢⑤億膨院権婿ナ彗掛樓辞
ALN 値樫禍券沸怪姫褐癬癬潜ぁげ除テ心贖PQ慢⑤億膨院権婿ナ彗掛樓辞
OUT 値操禍券沸怪姫褐潜ぁげ除テ心噴PQ慢⑤億膨院権婿チ彗排穫辞
steptime 3.63483
291399
TRU 碧浪戯受養g潰ぁ饗元私替拳草邦遥歓惧皇他浅危双志皓行填偏頃ジ
ALN 碧浪戯受養g潰ぁ饗元私替拳草邦遥歓惧皇他浅危双志皓行填偏頃ジ
OUT 皆浪戯受養g演zぁ饗元私替拳草邦遥歓惧皇他浅危双志皓行填偏頃ジ
steptime 3.89198
291400
TRU 暫郷滞辞荏髭珀壌蕎逝統尽ゲぢ灌勾里貿須客求姑鬼朔『益千后恩匡
ALN 暫郷滞辞荏髭珀壌蕎逝統尽ゲぢ灌勾里貿須客求姑鬼朔『益千后恩匡
OUT 暫郷滞辞在髭珀壌薄逝統尽ゲぢ濯勾里貿須客求貼鬼週『益千后恩医
steptime 3.77421
ERROR 291400 4.23446     1499 354

343798
TRU 終中豪辣無ヲ字っ矮自得練免思築敬院麩憲気自鉱住膨恒犯憂仁裂奏
ALN 終中豪辣無ヲ字っ矮自得練免思築敬院麩憲気自鉱住膨恒犯憂仁裂奏
OUT 終中豪辣無ヲ字っ矮肖得練免思築敬院麩憲気四鉱佳膨恒犯憂仁裂奏
steptime 5.43364
343799
TRU 戸ゲ惜鐘0男ヌ倍叱生特諦永励紅洗申波わ〜Oど羞簗研G工蓋蘂保
ALN 戸ゲ惜鐘0男ヌ倍叱生特諦永励紅洗申波わ〜Oど羞簗研G工蓋蘂保
OUT 芦ゲ惜鐘d男ヌ倍叱生特諦永励紅洗申波わ〜Oと羞簗研G工蓋蘂保
steptime 6.50675
343800
TRU 奥歩蝋鉢隕匙脈迫ベサ箸壺迅穴薪昂恐何ザ貨庶劫涛涼率夏璧電瑞援
ALN 奥歩蝋鉢隕匙脈迫ベサ箸壺迅穴薪昂恐何ザ貨庶劫涛涼率夏璧電瑞援
OUT 奥典歩蝋鉢隕匙脈迫ベサ箸蓋壷迅穴薪昂恐何ザ賃庶幼湾涼率夏璧電瑞援
steptime 5.93425
ERROR 343800 5.97211     1499 251
343801
TRU 和賃嶋錫児憂脳刺●任砧筒ゆぐ欣布崗泌薇群窈ゑづ奉暦^満q午郊
ALN 和賃嶋錫児憂脳刺●任砧筒ゆぐ欣布崗泌薇群窈ゑづ奉暦^満q午郊
OUT 和賃嶋錫児憂脳刺●佳砧筒ゆぐ欣布崗泌漫群窃窈さづ奉暦^濱q午郊
steptime 47.3135
343802
TRU ゅ轟計禮済呈肝配圓路曖最⑧痛閥民協扱バ酪痢麹問築持円赴寛侶綴
ALN ゅ轟計禮済呈肝配圓路曖最⑧痛閥民協扱バ酪痢麹問築持円赴寛侶綴
OUT ゆ轟訃禮済呈肝配圓路曖最⑧痛閥民協扱パ酷痢麹問築持団赴寛侶綴
steptime 5.32979
343803
TRU 卒㌧嗅慶雁瑕▼窯②曇侶誘鯵慨怒沃振下涎暴位流市皇学滅喉纂灌税
ALN 卒㌧嗅慶雁瑕▼窯②曇侶誘鯵慨怒沃振下涎暴位流市皇学滅喉纂灌税
OUT 卒㌧嗅慶雁瑕▼窯②曇侶誘鯵慨怒沃振下減暴位流市皇学滅喉纂濯税
steptime 3.96418
343804
TRU 准巨八康孔珍込漢''喫覚略ケ術1没聖穴遜ヂ埋粧畔0要参嘲振撃番
ALN 准巨八康孔珍込漢''喫覚略ケ術1没聖穴遜ヂ埋粧畔0要参嘲振撃番
OUT 准巨八康孔疹込漢''喫寛略ケ術1没聖穴遜ヂ埋粧畔0炭要参嘲振撃番
steptime 4.02296
343805
TRU 塙川暑使ズ疇珊躇菲召鳳彼逃象鼻質グ釜⑩柩内ヾ曜崔$区慎力膝穂

my trainning scripts

#!/bin/bash
##2016-06-03  chongxian riben ren tidao de diedai 15 wan ci  shoulian
set -e
debug=${debug:-0}
options=${options:-}
export PS4='
>>>>>>> '
trap "echo TEST FAILED" EXIT
set -x
export seed=0.222
scons -s -c; rm -f *.o *.a
scons -j 4 gpu=0 debug=$debug options="$options" clstmocrtrain clstmfiltertrain clstmfilter clstmocr test-lstm
time ./test-lstm
time ./test-japanese.sh
#time ./test-filter.sh
#time ./test-ocr.sh
scons -s -c; rm -f *.o *.a
scons -j 4 gpu=0 double=1 debug=$debug options="$options" test-cderiv test-deriv test-ctc
#./test-cderiv
#./test-deriv
./test-japanese.sh
rm -f *.pb.h *.pb.cc
scons -c all
#scons -s -c pyswig
#scons pyswig
#python test-lstm.py
#set +x
#scons -s -c all pyswig
trap "echo ALL TESTS PASS" EXIT

./test-japanese.sh

#!/bin/bash
set -ea
find ../clstm-Japanese-model/testdata3877/mincho/ -name '*.bin.png' | sort -r > japanese-3877-char
sed 1,0d japanese-3877-char > japanese-train-3877
sed 1,4950d japanese-3877-char > japanese-test-3877
report_every=1
save_every=1000
ntrain=400000
dewarp=center
display_every=100
test_every=100
display_every=100
testset=japanese-test.h5
hidden=800
lrate=1e-4
save_name=japanese-3877
report_time=1
:
#load=xps-352000.clstm
# gdb --ex run --args \
./clstmocrtrain japanese-train-3877  japanese-test-3877

error rate each 100 iteration 1.txt