mathDR / reading-text-in-the-wild

A Keras/Theano implementation of "Reading Text in the Wild with Convolutional Neural Networks" by M Jaderberg et.al.
GNU General Public License v3.0
116 stars 30 forks source link

make_keras_charnet_model error #11

Closed sam-heilbron closed 7 years ago

sam-heilbron commented 7 years ago

After downloading the charnet matlab data, I am running the second step to build the json files. I have uploaded the keras tweaks to my installs however I'm still running into the following error:

AttributeError: module 'keras.backend' has no attribute 'custom_spatial_2d_padding'

This is surprising since in the backend/theano_backend.py I have the custom_spatial_2d_padding method. Any ideas why this is happening? I noticed that when I run the script, it prints out "Using TensorFlow backend." Is this expected?

Thanks!

mathDR commented 7 years ago

You need to be using the theano backend. You need to install the correct versions of both tensorflow and keras

sam-heilbron commented 7 years ago

Yeah that makes sense. Thanks!

In the paper, they describe the CNN as being useful both for text detection and word recognition. Does this implement that or just the word recognition part? Would this work with an image with multiple words placed in the image or just images of a single word like the example images?

mathDR commented 7 years ago

This is ONLY the word recognition part. The code expects a cropped text image as input. If you give it multiple words, the dictnet won't work. If the total length of the text is less than (I think) 22 characters, then charnet may give you good results.

sam-heilbron commented 7 years ago

Sorry yeah just saw that in a previous issue. Just to clarify, the only difference between charnet and dictnet is how the models are trained but the purpose of both (individual word recognition) is the same. Do you know of any other competing text-in-the-wild solutions to the one proposed in the Jaderberg paper?

mathDR commented 7 years ago

In paper format? No. The google translate app for iOS clearly is doing this in its image functionality, so you might want to search in that space...If you just need a solution, they offer their api for it.

sam-heilbron commented 7 years ago

Thanks! One last thing...

I ran into similar issues with testing charnet on the example images. I saw you had a conversation in a previous issue thread about how this wasn't broken before and that if training was done exclusively in Python, this would not be an issue:

  1. Have you had any other progress since then related to what in the _preprocess method was contributing to this?
  2. Is there additional info needed to train the models than what is provided in the TRAIN folder?
mathDR commented 7 years ago

The training was never completed (or successfully ran) in python. I had to move on to other things so I never completed it.

As far as I know, the _preprocess method not comporting to matlab is the culprit. If you get anything to work, please let me know! Or better yet, submit a PR!