Closed Naruto-Sasuke closed 6 years ago
I have an idea, resize the image to the same height and padding it to the same width, then
in seq_len
change each time_step to the real width.(default is all 160 with batch_size
length)
please make a comment if you have a try whether it works or not.
I processed my dataset as you told. However I have some questions about your code. I scale my images to the same height
im = cv2.resize(im,(im.shape[1],image_height))
...
batch_inputs,batch_seq_len = pad_input_sequences(np.array(image_batch))
However the self.inputs = tf.placeholder(tf.float32, [None, None, num_features])
in lstm_ocr.py
shows that It uses the num_features=utils.num_features
. While in utils.py
, it looks lijke this:
channel = 1
image_width=100 # not useful in training
image_height=300
num_features=image_height*channel
Q1. In infer.py
:
im = cv2.resize(im,(utils.image_width,utils.image_height))
It uses the utils.image_width
, however the values vary quite a lot for different images. How do deal with it?
Q2. It seems that you comment the shuffle
code and data_argument
functions. I only has thousands of images, Can I do something beneficial for training using these code?
Q1. two solutions
S1: use batch_size = 1
S2.1: resize each image to the same height(keep ratio), now the width of each image is different, so pad them to the same width(with zero). Notice that the same width can be the max width of each batch.
S2.2: if you want to do it better, let the lstm to not care the padded zeros, you may read this
Q2. in the master branch, I already use np.random.permutation
to shuffle the batch.
of course, you can try data_augment, but do not push it too "hard".(it may make the network harder to converge)
i used another naive approach to solve this however i am thinking to optimize it and use bucketing to solve variate sizes.
Current Approach:
i found out the max width image in my dataset and padded on right to all the images with whitespace to that width hence making all the images of same width and height. changed parameters in the config.py
to width and height. This is a naive approach and may not be applicable in every case. plus it will be computationally expensive too
Analysis
I am into the idea of pad_input_sequences
but that would only help in the CNN part. after that LSTM will have padded feature sequences. i have come accross the concept of bucketing mainly bucket_by_sequence_length
which can optimize in even further
Question
Any suggestions about this approach. i am not sure how in this case time_step
will work for variate size buckets. should it still be half of the image width?
currently you are using tf.train.shuffle_batch
how can we use tf.contrib.training.bucket_by_sequence_length
instead
def get_data(self,path,batch_size,num_epochs):
filename_queue = tf.train.string_input_producer([path], num_epochs=num_epochs)
image,label,label_len,time_step= read_tfrecord_and_decode_into_image_annotation_pair_tensors(filename_queue)
image_batch, label_batch, label_len_batch,time_step_batch = tf.train.shuffle_batch([image,label,label_len,time_step],
batch_size=batch_size,
capacity=9600,
num_threads=4,
min_after_dequeue=6400)
I wrote a version of generating data on the fly. it pads the image to the same width for every batch
the code is in the beta
branch
for your question, since I use CNN, the receptive field is enlarged, so time_step shall be a little less than img_width/2 if you really want to not care the padding area or just let the network to learn to not care that area, so img_width//2 shall be alright.
I haven't use tf.contrib.training.bucket_by_sequence_length
before, If you have tried that, leave a comment about the result.
I have lots of segmentaion images with different sizes. Must I rescale images to the same size? that would be much better if training can be done without scaling.