weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
497 stars 170 forks source link

Feature Extraction using CNN and Window width #2

Closed sanakhamekhem closed 6 years ago

sanakhamekhem commented 7 years ago

Hi, I would like to use this code to extract features using CNN, I'm asking if I can use a sliding window width more than 1 pixel . My goal is to extract a set of features based on CNN and to train a BLSTM-CTC recognizer.

weinman commented 7 years ago

Thanks for your interest and query. Unfortunately, I’m not sure precisely what you are asking for when you say “sliding window width more than 1 pixel”. This model extracts CNN features, reducing the resolution on the way via max pooling, and those outputs are used as inputs to the LSTM. Which level of the process are you trying to change?

Perhaps you are saying you want CNN features from multiple time steps going into one LSTM cell?

sanakhamekhem commented 7 years ago

Hi Mr Weinman, Thank you for your response. Actually, I have built a handwritten text recogition system based on handcrafted features and using the BLSTM-CTC classifier. I have used the EESEN toolkit for this purpose. For feature extraction, I have used a sliding window over the line image (widh=3 pixels and shift=1). I would like to perform CNN for feature extraction and to fed their outputs to the BLSTM for training using the same toolkit (EESEN) for comparison purpose. I'm trying to understand how can I perform this.

2017-07-21 2:49 GMT+01:00 Jerod Weinman notifications@github.com:

Thanks for your interest and query. Unfortunately, I’m not sure precisely what you are asking for when you say “sliding window width more than 1 pixel”. This model extracts CNN features, reducing the resolution on the way via max pooling, and those outputs are used as inputs to the LSTM. Which level of the process are you trying to change?

Perhaps you are saying you want CNN features from multiple time steps going into one LSTM cell?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weinman/cnn_lstm_ctc_ocr/issues/2#issuecomment-316878986, or mute the thread https://github.com/notifications/unsubscribe-auth/AO5KDI6GID67q_uF8YjeD0idVTgw2-tAks5sQAO4gaJpZM4OdDzq .

-- Best regards, Sana Khamekhem Jemni


PhD Student MIRACL Laboratory ENIS


sanakhamekhem commented 7 years ago

Hi Mr Weinman,

Please, I would like to extract only the features as a matrix or vector for each image. In this code, is the logits tensor correspond to features?? If yes, how can I convert it to vector?? For information: the logits var contains: Tensor("rnn/logits/Relu:0",shape(?,256,63),dtype=float32,device=/device:GPU:0) Thank you in advance.

2017-07-21 15:06 GMT+01:00 Sana Khamekhem sana.khamekhem@gmail.com:

Hi Mr Weinman, Thank you for your response. Actually, I have built a handwritten text recogition system based on handcrafted features and using the BLSTM-CTC classifier. I have used the EESEN toolkit for this purpose. For feature extraction, I have used a sliding window over the line image (widh=3 pixels and shift=1). I would like to perform CNN for feature extraction and to fed their outputs to the BLSTM for training using the same toolkit (EESEN) for comparison purpose. I'm trying to understand how can I perform this.

2017-07-21 2:49 GMT+01:00 Jerod Weinman notifications@github.com:

Thanks for your interest and query. Unfortunately, I’m not sure precisely what you are asking for when you say “sliding window width more than 1 pixel”. This model extracts CNN features, reducing the resolution on the way via max pooling, and those outputs are used as inputs to the LSTM. Which level of the process are you trying to change?

Perhaps you are saying you want CNN features from multiple time steps going into one LSTM cell?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weinman/cnn_lstm_ctc_ocr/issues/2#issuecomment-316878986, or mute the thread https://github.com/notifications/unsubscribe-auth/AO5KDI6GID67q_uF8YjeD0idVTgw2-tAks5sQAO4gaJpZM4OdDzq .

-- Best regards, Sana Khamekhem Jemni


PhD Student MIRACL Laboratory ENIS


-- Best regards, Sana Khamekhem Jemni


PhD Student MIRACL Laboratory ENIS


weinman commented 7 years ago

As its name indicates, the first return value from function model.convnet_layers, called features gives you the raw convolutional features. The return value called logits from model.rnn_layers gives you the pre-softmax character (or generally: output class) scores for each time step.

If you want the deepest, sequence-based (LSTM) features before the final fully-connected classification layer (logits above), you'd have to modify model.rnn_layers to return its rnn2 variable.

Once you have determined which tensor's value you desire, both tensorflow.org and stackoverflow have a plethora of information about how to extract it. You may redirect your queries there if necessary. I will answer questions specifically about the code here.

sanakhamekhem commented 7 years ago

Thank you Mr Weinman for your response. I'I need features before the fully connected layer(rnn2). Please, I have a question not related really for this code. I will extract this features and I will get the matrix values in order to fed them to my BLSTM based system. The window width will be 1 pixel. My question is: What is the dimension of the extracted features of an image having a width equal to 200 pixels and a height equal to 64 pixels ? I expect that it will be 200?? Thanks again for your help.

2017-08-24 14:55 GMT+01:00 Jerod Weinman notifications@github.com:

As its name indicates, the first return value from function model.convnet_layers, called features gives you the raw convolutional features. The return value called logits from model.rnn_layers gives you the pre-softmax character (or generally: output class) scores for each time step.

If you want the deepest, sequence-based (LSTM) features before the final fully-connected classification layer (logits above), you'd have to modify model.rnn_layers to return its rnn2 variable.

Once you have determined which tensor's value you desire, both tensorflow.org and stackoverflow have a plethora of information about how to extract it. You may redirect your queries there if necessary. I will answer questions specifically about the code here.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weinman/cnn_lstm_ctc_ocr/issues/2#issuecomment-324642404, or mute the thread https://github.com/notifications/unsubscribe-auth/AO5KDLWy8u7-bqDxtvZGGegFv8hBIQp6ks5sbYDogaJpZM4OdDzq .

-- Best regards, Sana Khamekhem Jemni


PhD Student MIRACL Laboratory ENIS


weinman commented 7 years ago

The tensor sequence_length returned by model.convnet_layers would tell you the horizontal width of the features. As to the vertical height, the likelihood of me making one or more off-by-one-errors in the pipeline is great, so I won't hazard a guess. (The calculation would be similar to what you see in the README's Structure. The best way to verify it is run such an image through a session and verify the numpy matrix size. (You might also use the graph viewer in Tensorboard.)

sanakhamekhem commented 7 years ago

Ok, thank you very much Mr Weinman

2017-08-25 18:08 GMT+01:00 Jerod Weinman notifications@github.com:

The tensor sequence_length returned by model.convnet_layers would tell you the horizontal width of the features. As to the vertical height, the likelihood of me making one or more off-by-one-errors in the pipeline is great, so I won't hazard a guess. (The calculation would be similar to what you see in the README's Structure https://github.com/weinman/cnn_lstm_ctc_ocr/blob/master/README.md#structure. The best way to verify it is run such an image through a session and verify the numpy matrix size. (You might also use the graph viewer in Tensorboard.)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/weinman/cnn_lstm_ctc_ocr/issues/2#issuecomment-324980807, or mute the thread https://github.com/notifications/unsubscribe-auth/AO5KDP81EHxm7vrZNH5vsgQ8Q-ntP47Fks5sbv-AgaJpZM4OdDzq .

-- Best regards, Sana Khamekhem Jemni


PhD Student MIRACL Laboratory ENIS