tuandoan998 / Handwritten-Text-Recognition

IAM dataset
49 stars 13 forks source link

What about the dataset?? #2

Closed 3timesv closed 5 years ago

3timesv commented 5 years ago

Could you give the details of the dataset used? (Do you used entire IAM dataset??)

tuandoan998 commented 5 years ago

I ignore words and lines that have status is 'err'. Details in function get_paths_and_texts() - https://github.com/tuandoan998/Handwriting-OCR/blob/master/Utils.py#L30

naveen-marthala commented 4 years ago

Nice work. I still can't figure out how you are feeding data. and function "get_paths_and_texts()" has no in line comments too. I would like to know how have prepared the data(since images have variable dimensions) and feeding it during the model. Will the data be fed dynamically during training?

tuandoan998 commented 4 years ago

Nice work. I still can't figure out how you are feeding data. and function "get_paths_and_texts()" has no in line comments too. I would like to know how have prepared the data(since images have variable dimensions) and feeding it during the model. Will the data be fed dynamically during training?

The "get_paths and text ()" function simply takes the corresponding path and text label (ground truth) of the image in the IAM dataset. All images are resized to the same size (w,h=128x64 or 800x64) before being included in the CRNN model.

naveen-marthala commented 4 years ago

in CRNN_Model.py, in this line: https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/CRNN_Model.py#L91 Why is inputs not just equal to input_data, but all of those. What is the reason? Asking because, all I have seen is just images being sent to the inputs and outputs will be the target labels.

naveen-marthala commented 4 years ago

Or can I do:

import tensorflow as tf
inputs =

https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/CRNN_Model.py#L26 . .## all the remaining layers here. . outputs = https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/CRNN_Model.py#L80 and then, model = tf.keras.Model(inputs=inputs, outputs=outputs) then do: model.compile(....) and then fit, as suggested in the documentation like this: model.fit() Is this at least close to what you have done?

tuandoan998 commented 4 years ago

@naveen-kumar-123 There are two type of model in this, training (model) and inference (model_predict). Each model need different inputs and outputs:

naveen-marthala commented 4 years ago

Ok. So, since CTC loss needs all of them to compute loss, they are being sent. And output of CTC loss will be sent for back-propagation, right?

naveen-marthala commented 4 years ago

some questions:

  1. You might be aware that softmax outputs probabilities(that sum to one) for each output CLASS(each letter of a word in our case). So, I think softmax fails for words like too, given there are two classes(two letter Os) in the same target label, according to softmax. I understand that softmax worked for you in this case. Every article I have read tells that softmax works only for multi-class classification problems, not multi-label. So what's the theory behind this?
  2. Here: https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L57-L58 with this line, you are just sending arrays of ones and zeroes to CTC. Why just ones and zeroes? and here: https://github.com/tuandoan998/HTR-for-IAM/blob/ffa2696a744e7c2256282a8eb7712290ad9f4f5e/ImageGenerator.py#L77 your outputs are array of zeros. Why send zeroes?
tuandoan998 commented 4 years ago
  1. You should read about Keras and this model again :)
  2. That is just the initial value.
naveen-marthala commented 4 years ago

@tuandoan998 , Thank you so much for taking the time to reply me. I will read about it.