ilovin / lstm_ctc_ocr

Use CTC + tensorflow to OCR
https://ilovin.github.io/2017-04-06/tensorflow-lstm-ctc-ocr/
354 stars 140 forks source link

./test.sh error #53

Open boris-lb opened 6 years ago

boris-lb commented 6 years ago

2018-09-01 09:05:39.535363: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: sequence_length(0) <= 29 Traceback (most recent call last): File "./lstm/test_net.py", line 73, in restore=bool(int(args.restore))) File "./lstm/../lib/lstm/test.py", line 102, in test_net sw.test_model(sess, testDir=testDir, restore=restore) File "./lstm/../lib/lstm/test.py", line 80, in test_model res = sess.run(fetches=dense_decoded[0], feed_dict=feed_dict) File "/home/liubo/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/liu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/liu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/liu/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 29 [[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=true, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](logits/transpose/_93, _arg_time_step_len_0_2)]] [[Node: SparseToDense/_95 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_377_SparseToDense", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]] 训练出来精度99,但是测试的时候报错,很奇怪不知道该调哪里,碰到的小伙伴帮忙解答一下。

boris-lb commented 6 years ago

@ilovin 训练集精度上去了,测试集精度为0,能帮忙说一下原因吗?而且训练的时候验证集每次用的都是一样的数字。也不是我验证集里面的数据。

ilovin commented 6 years ago

测试的时候输出是啥

boris-lb commented 6 years ago

4_31,759.jpg 15,2 6_3.jpg 78 7_996.jpg 7899 9_732.jpg 7,9 12_40,986.jpg 180,8 16_38,854.jpg 1,8606 26_0.jpg 740 36_0.jpg 780 37_0.jpg 780 43_367.jpg 7,6 45_7.jpg 74 47_10.jpg 780 49_0.jpg 70 55_14,950.jpg 149,8 56_0.jpg 7,0 60_0.jpg 7,0 65_0.jpg 747 67_0.jpg 746 这是部分输出结果,下划线后面是标签,我训练的是十个数字加一个逗号,共11个字符,图片尺寸(160,60)时一直不收敛(迭代十几万次),调整到(120,45)很快收敛了,信息如下: iter: 100 / 1000000, total loss: 13.0400743, lr: 0.0001000 speed: 0.102s / iter iter: 200 / 1000000, total loss: 10.5619059, lr: 0.0001000 speed: 0.117s / iter iter: 300 / 1000000, total loss: 7.7125740, lr: 0.0001000 speed: 0.119s / iter iter: 400 / 1000000, total loss: 2.0175159, lr: 0.0001000 speed: 0.111s / iter iter: 500 / 1000000, total loss: 0.5158803, lr: 0.0001000 speed: 0.118s / iter iter: 600 / 1000000, total loss: 0.3201181, lr: 0.0001000 speed: 0.115s / iter iter: 700 / 1000000, total loss: 0.1470777, lr: 0.0001000 speed: 0.113s / iter iter: 800 / 1000000, total loss: 0.4072588, lr: 0.0001000 speed: 0.115s / iter iter: 900 / 1000000, total loss: 0.2619937, lr: 0.0001000 speed: 0.117s / iter seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] accuracy: 0.98438 iter: 1000 / 1000000, total loss: 0.0657427, lr: 0.0001000 speed: 0.127s / iter iter: 1100 / 1000000, total loss: 0.0259138, lr: 0.0001000 speed: 0.119s / iter iter: 1200 / 1000000, total loss: 0.0610742, lr: 0.0001000 speed: 0.113s / iter iter: 1300 / 1000000, total loss: 0.0725027, lr: 0.0001000 speed: 0.109s / iter ('loss: ', 0.014755567) Wrote snapshot to: /home/liu/lstm_ctc_ocr-beta/output/lstm_ctc/lstm_ctc_iter_2.ckpt seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] accuracy: 1.00000 iter: 1400 / 1000000, total loss: 0.1878222, lr: 0.0001000 speed: 0.113s / iter iter: 1500 / 1000000, total loss: 0.0301566, lr: 0.0001000 speed: 0.117s / iter ('loss: ', 0.0140728075) Wrote snapshot to: /home/liu/lstm_ctc_ocr-beta/output/lstm_ctc/lstm_ctc_iter_2.ckpt seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] accuracy: 1.00000 iter: 1600 / 1000000, total loss: 0.2451135, lr: 0.0001000 speed: 0.113s / iter ('loss: ', 0.01386846) 但是测试结果就如上所示了,所以麻烦问一下是什么情况,万分感谢。 PS: seq 0: origin: [8, 1, 6, 11] decoded:[8, 1, 6, 11] seq 1: origin: [8, 7, 1, 10, 3, 4, 10, 5] decoded:[8, 7, 1, 10, 3, 4, 10, 5] seq 2: origin: [2, 2, 4, 2, 3] decoded:[2, 2, 4, 2, 3] seq 3: origin: [4, 11] decoded:[4, 11] seq 4: origin: [3] decoded:[3] 这些数据不是我val里面的数据标签,不知道读的哪里的。

ilovin commented 6 years ago

你测试数据和训练数据是同分布吗,都是reptcha生成的?

boris-lb commented 6 years ago

数据不是生成的,是我自己的数据,一批数据随机挑出2000测试集,剩下30000多是训练集和验证集。

longmao-yiran commented 6 years ago

我觉得有可能是你测试的config没够改 测试时的图像和训练没有同分布

boris-lb commented 6 years ago

测试集数据跟训练集分布是一样的,用另外的代码跑这些数据集都正常。我觉得还是图片尺寸和网络之间的问题,(160,60)迭代十几万次都不收敛,改成(120,45)的迭代几轮就收敛了可能不是正常收敛,我再仔细研究一下,找到错误原因后再来给大家分享吧。有遇到类似情况的朋友希望分享下解决方案。