weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
498 stars 170 forks source link

Dynamic training data shape error. #47

Closed pczzy closed 5 years ago

pczzy commented 5 years ago

ValueError: generator yielded an element of shape (37, 109, 1) where an element of shape (32, ?, 1) was expected.

the pipline.py call preprocess data dataset = dataset.map( dpipe.preprocess_fn, num_parallel_calls=num_threads ) seems ok and maptextsynth.py use the new normalize_image method

def _preprocess_image( image ): """Rescale image""" image = pipeline.normalize_image(image) return image

sahilbandar commented 5 years ago

@pczzy As in error it is showing that you've passed the input image with shape (37, 109, 1). Your input image shape must be (32, ?, 1). In shape (32, ?, 1), the '?' mean to any size, Means your image height must be 32, but width can be anything is acceptable with channel 1. Please checkout the input image shape, which you are passing in.

pczzy commented 5 years ago

maptextsynth.py : get_dataset._generator_wrapper() yield (37, 109, 1) image and return a tf.data.Dataset with (32,None,1) tensor shape. so they throw a exception. the preprocess_fn do not normalize_image at all.

maptextsynth.py code: def get_dataset( args=None ): """ Get a dataset from generator Format: [text|image|labels] -- types and shapes can be seen below """

def _generator_wrapper():
    """
    Wraps data_generator to precompute labels in python before everything
    becomes tensors. 
    NOTE: Local to get_dataset for sensible passing of args to generator
    function.  
    Returns:
    caption : ground truth string
    image   : raw mat object image [32, ?, 1] 
    label   : list of indices corresponding to out_charset plus a temporary
              increment; length=len( caption )
    """

    # Extract args
    [ config_path, num_producers ] = args[0:2]

    # TODO/NOTE currently using 0 to get true single threaded synthesis
    gen = data_generator( config_path, num_producers )

    while True:
        caption, image = next( gen )

        # Transform string text to sequence of indices using charset dict
        label = charset.string_to_label(caption)

        # Temporarily increment all labels so that zero can be the EOS token
        # during post-batch dense-to-sparse conversion
        label = [index+1 for index in label]
        #image = pipeline.normalize_image(image)
        print(caption,image.shape)
        cv2.imwrite("./%s.jpg" % caption,image)
        yield caption, image, label

return tf.data.Dataset.from_generator( 
    _generator_wrapper, 
    (tf.string, tf.uint8, tf.int32),   # Output Types
    (tf.TensorShape( [] ),             # Text shape
     tf.TensorShape( (32, None, 1) ),  # Image shape  ,Super modify from (32,None,1) to (None,None,1)
     tf.TensorShape( [None] )) )       # Labels shape

def preprocess_fn( caption, image, labels ): """ Reformat raw data for model trainer. Intended to get data as formatted from get_dataset function. Parameters: caption : tf.string corresponding to text image : tf.uint8 tensor of shape [32, ?, 1] labels : tf.int32 tensor of shape [?] Returns: image : preprocessed image tf.float32 tensor of shape [32, ?, 1] (? = width) width : width (in pixels) of image tf.int32 tensor of shape [] labels : list of indices (+1) of characters mapping text->out_charset tf.int32 tensor of shape [?] (? = length) length : length of labels tf.int64 tensor of shape [] text : ground truth string tf.string tensor of shape [] """ image = _preprocess_image( image )

# Width is the 2nd element of the image tuple
width = tf.size( image[1] ) 

# Length length of labels/caption
length = tf.size(labels)
text = caption
return image, width, labels, length, text

def postbatch_fn( image, width, label, length, text ): """ Prepare dataset for ingestion by Estimator. Sparsifies and decrements labels, and 'packs' the rest of the components into feature map """

# Labels must be sparse for ctc functions (loss, decoder, etc)
# Convert dense to sparse with EOS token of 0
label = tf.contrib.layers.dense_to_sparse( label, eos_token=0 )

# Reconstruct sparse tensor, un-incrementing label values after conversion
label = tf.SparseTensor( indices=label.indices,
                         values=tf.subtract(label.values,1), # decrement
                         dense_shape=label.dense_shape )

# Format relevant features for estimator ingestion
features = {
    "image" : image, 
    "width" : width,
    "length": length,
    "text"  : text
}

return features, label

def element_length_fn( image, width, label, length, text ): """ Determine element length Note: mjsynth version of this function has an extra parameter (filename) """ return width

def _preprocess_image( image ): image = pipeline.normalize_image(image) return image

weinman commented 5 years ago

@sahilbandar thanks for taking a stab at that one! The model code does require the input image to be 32 pixels high, but in this case I think the problem is likely that the MapTextSynthesizer is being allowed to generate larger images, whereas the specified generator output is 32 pixels.

I see two possibilities.

  1. Change the maptextsynth.py call to from_generator to use a TensorShape([?,?,1]).
  2. Change the height_max in MapTextSynthesizer's config.txt to be 32.

2 is preferable because the map text synthesizer is already doing the work of animage resize when it rasterizes the vector, so it seems foolish to generate a big image only to ask the input pipeline to resize it. The normalize_image routine was added for test/eval time operations, and not intended for training.

pczzy commented 5 years ago

@weinman Many thanks ,it's works now.