watsonyanghx / CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR implemented using tensorflow.
MIT License
362 stars 210 forks source link

Some problems aboult this code #8

Open 980044579 opened 6 years ago

980044579 commented 6 years ago

Infact the "max_stepsize" in this code should't be 64.The "max_stepsize" is equal to 12,which is shrunk from original "image_width"(180) to 180/2/2/2/2 = 12.Remenber the core idea in CRNN+CTC is that we split the image vertically to many slices,and we predict each slice's classes,finally using CTC to decode the predicted sequence to the respectd result.For example "aaa_bbc"and "a__b_ccc" both respect to the same label "abc",you can also read the paper for more details.

But when I run the wrong code in author's dataset,and I got 98% accuracy while I got a bad result in VGGWord dataset.Finally I got a good result after changing the code.

So, why this code work in your situation,I am very courious about this.Thank you.

LevinJ commented 6 years ago

@980044579 , thanks for sharing your observations and experience.

  1. With the great source codes in this project and the data provided, I was able to reproduce the author's result, getting 0.997 at 50th epoch.
  2. I agree with you on the max_stepsize. it should be in the direction of "image_width", 12 in this project. I also plan to correct this and see how it might impact the final result., If it's okay, can you share your code changes in this area?
980044579 commented 6 years ago

Just change the code between CNN -> RNN in cnn_lstm_otc_ocr.py, make sure the shape of the input of RNN is [batch_size, max_stepsize, num_features].

LevinJ commented 6 years ago

Hi @980044579 , thanks a lot for your kind reply. I did the code changes too in yesterday and found the model can achieve 0.999 accuracy at 12th epoch. so the model is able to converge faster and achieve better performance after fixing this bug.

For those who are interested, here is my code changes.

980044579 commented 6 years ago

Good job~

anubhavrohatgi commented 6 years ago

I am getting and error Failed precondition: sequence_length(0) <= 12

What I did for inference is I have already trained the model to

model_checkpoint_path: "ocr-model-21001" all_model_checkpoint_paths: "ocr-model-21001"

on a set of 80000 train and 20 val images a provided in the dataset. I took a few images from val set and create a folder infer(40imgs named 1.png .. 40.png). I tried to run the code for inference using the command given in the readme.

INFO:tensorflow:Restoring parameters from ./checkpoint/ocr-model-20001 restore from ckpt./checkpoint/ocr-model-20001 2018-01-23 11:16:17.305360: W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: sequence_length(0) <= 12 Traceback (most recent call last): File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12 [[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./main.py", line 184, in tf.app.run() File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./main.py", line 179, in main infer(FLAGS.infer_dir, FLAGS.mode) File "./main.py", line 155, in infer dense_decoded_code = sess.run(model.dense_decoded, feed) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.FailedPreconditionError: sequence_length(0) <= 12 [[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]

Caused by op 'CTCBeamSearchDecoder', defined at: File "./main.py", line 184, in tf.app.run() File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "./main.py", line 179, in main infer(FLAGS.infer_dir, FLAGS.mode) File "./main.py", line 115, in infer model.build_graph() File "/home/anubhav/Downloads/Manish Sir/CNN_LSTM_CTC_Tensorflow-master (2)/cnn_lstm_otc_ocr.py", line 24, in build_graph self._build_train_op() File "/home/anubhav/Downloads/Manish Sir/CNN_LSTM_CTC_Tensorflow-master (2)/cnn_lstm_otc_ocr.py", line 158, in _build_train_op merge_repeated=False) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/ops/ctc_ops.py", line 269, in ctc_beam_search_decoder merge_repeated=merge_repeated)) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_ctc_ops.py", line 76, in _ctc_beam_search_decoder top_paths=top_paths, merge_repeated=merge_repeated, name=name) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/home/anubhav/.virtualenvs/cv/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

FailedPreconditionError (see above for traceback): sequence_length(0) <= 12 [[Node: CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](lstm/transpose_2, _arg_lstm/Fill_0_1)]]

980044579 commented 6 years ago

@anubhavrohatgi make sure the maxlength of label in your dataset must <= max_stepsize

anubhavrohatgi commented 6 years ago

@980044579 Please brief me a bit, quiet new to this stuff in Python. what maxlength of label is.

Currently I am using the dataset that was provided in the link given in the repo. Max_stepsize = 64, i guess as is stated in utils.py

All images are 180x60.

error occurs somewhere here: dense_decoded_code = sess.run(model.dense_decoded, feed)

below is my infer folder contents screen2

anubhavrohatgi commented 6 years ago

are you talking about the labels.txt?

Correct me if I am wrong here:: by infer we mean we are testing on our real time data. is it. If not please help me, how can I use the model to predict the values of a given input image.

fanw52 commented 6 years ago

@anubhavrohatgi @980044579 ,hello, i run into the same question,but i inspect the label and find the max length of label is not greater than maxT in[maxT,batch_size,num_char],have you solve it? i don't konw how to do it

980044579 commented 6 years ago

@anubhavrohatgi @kstys make sure you understand how the framework "CNN + RNN + CTC" work and there are some bugs in this code.You should not only change the "maxsteps" in utils.py but also the code between CNN ——> RNN in cnn_lstm_otc_ocr.py

lovebobo commented 6 years ago

I have a question. in the file of cnn_letm_otc_ocr.oy , after cnn, the x.set_shape([FLAGS.batch_size, filters[3], 24]) is right? the time sequence should be the width which will be feed to the LSTM, but the code is the length of channels.

lovebobo commented 6 years ago

I changed the code as @LevinJ ,but i got a error "tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found."

I set the max_step as 128 and my input image is 32*192