mvoelk / ssd_detectors

SSD-based object and text detection with Keras, SSD, DSOD, TextBoxes, SegLink, TextBoxes++, CRNN
MIT License
302 stars 85 forks source link

Testing on my own data. #5

Closed krish240574 closed 5 years ago

krish240574 commented 5 years ago

Hi, How can I use the tbpp(Densenet) with my own data? Specifically, I see inside tbpp_evaluate.ipynb that I need to do a sample_random_batch() from the gt_util_val, that is created from the split() on the pickle file containing - gt_util_synthtext_seglink.pkl. I understand that I need to create a pickle file with my test data, in order to fit into this pipeline. However, in your code I see that the pkl is generated from a .mat file(I understand that is the Matlab format).

So here is where I am stuck, do I need to create my files in a .mat format, so I can sneak them into the pipeline, for evaluation/testing? Can I create a pkl file directly, bypassing the .mat creation? I do believe it would be quite useful if there was a small writeup about the format that the input data needs to be in, I've looked hard at Ankush Gupta's SynthText repo(https://github.com/ankush-me/SynthText) and am still wrapping my head around the format used. I'm reading Ankush's paper as you view this issue too.

Please clarify, Thanks, Krishna

mvoelk commented 5 years ago

The code in the data_*.py files is dataset specific and derives a GTUtility class. Objects of the GTUtility class are only pickled to avoid the long preprocessing time of the datasets. The gt.mat file is specific to the SyntText dataset.

The attributs image_names, data and text of the GTUtility class are lists with as many elements as samples in the dataset.
image_names contains strings of the image file names.
data contains numpy arrays, where each row corresponds to a text instance and contains the vertices (x1, y1, x2, y2, x3, y3, x4, y4) of the oriented bounding box normalized by the image size, followd by a one hot encoding of the classification, which is in the text case always (0, 1).
text contains lists with the text strings associated to the text instances and is used as ground truth for the recognition stage.

If you only want to do prediction, you can proceed as with the real world images in SL_predict.ipynb

krish240574 commented 5 years ago

Thank you for the detailed explanation. Let me try the predictions and get back to you if any issues. Cheers, Krishna

krish240574 commented 5 years ago

Another question, how do I test the CRNN part with my own data? I see that the pre-trained CRNN models use the .pkl code again. Is there any code to test with my own data similar to what you have, for cropping box prediction? (I understand that the CRNN takes the bounding boxes, cropped after detection by the first tbpp NN). I'm looking at all files in the repo, can't find any code yet.

Specifically, do I need to dump all the results of bounding-box detection into a .pkl, for the CRNN to predict, using? Thanks. Krishna

mvoelk commented 5 years ago

CRNN_train.ipynb,SL_end2end_predict.ipynb and sl_videotest.py may be relevant for you.

In general, the input of the CRNN model is a batch of 32x256 grayscale images... The rest is up to you ;)

sniper0110 commented 5 years ago

Hello,

I am trying to use your pretrained model of CRNN (with lstm or gru) to recognize text on my images. I am using images from ICDAR2015 scene text dataset. For this I am using a small code :

import numpy as np
import matplotlib.pyplot as plt
import os
import editdistance
import pickle
import time

from keras.optimizers import SGD, Adam
from keras.callbacks import ModelCheckpoint

from crnn_model import CRNN
from crnn_data import InputGenerator
from crnn_utils import decode
from ssd_training import Logger, ModelSnapshot
import cv2
from crnn_utils import alphabet87 as alphabet

##Model
input_width = 256
input_height = 32
batch_size = 128
input_shape = (input_width, input_height, 1)

model, model_pred = CRNN(input_shape, len(alphabet), gru=False)
experiment = 'crnn_lstm_synthtext'
path_to_weights = './checkpoints/201806162129_crnn_lstm_synthtext/weights.300000.h5'
#path_to_weights = './checkpoints/201806190711_crnn_gru_synthtext/weights.300000.h5'
model_pred.load_weights(path_to_weights)

path_to_cropped_text = "" # path to my cropped text

my_img = cv2.imread(path_to_cropped_text)
resized_img = cv2.resize(my_img, (256,32))
gray_img = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY)
np_gray_img = np.reshape(gray_img, (1,256,32,1))

prediction = model_pred.predict(np_gray_img)

##Decode predictions
chars = [alphabet[c] for c in np.argmax(prediction[0], axis=1)]
res_str = decode(chars)

Unfortunately, I am getting almost always the result as "N" as if my text is a letter N. I don't know why is this happening and maybe I am making a mistake on how to use your code.

My original image had shape (69, 256, 3) and then I resized it to be compatible with input shape and of course I changed it to grayscale. I checked the image after this transformation and the text is still pretty obvious (no distorsions) so I was wondering what I am doing wrong.

Any help is greatly appreciated!

mvoelk commented 5 years ago

Okay, I was curious and spent some time figuring it out...

The reshape operation

np_gray_img = np.reshape(gray_img, (1,256,32,1))

fails in your case. Try something like the following

np_gray_img = gray_img.T[None,:,:,None]

sniper0110 commented 5 years ago

Thanks a lot mate, that was the problem indeed. I am very curious as to why the operation you did (gray_img.T[None,:,:,None]) is different than my operation (np.reshape(gray_img, (1,256,32,1)). At the end they both give arrays with equal shapes (1, 256, 32, 1). Can you elaborate more on how are they different please? I am very curious!

mvoelk commented 5 years ago

OpenCV is not always as intuitive as it could be. However, the output of the cv functions has shape (32, 256) and you need the transpose.

For more details, please see the NumPy help.