Open kamae opened 5 years ago
I guess your image data is grayscale with shape (512,512) or (512,512,1). I always used RGB images (e.g. shape (512,512,3)) and hard coded the channel means for compatibility with caffe models.
Markus
Thank you again for your kind assistance.
I was not loading images to predict. I want to use the ones in ssd_detectors-master\data\images They are boys.jpg, cafr_cat.jpg, fish-bike.jpg.
Or those in ssd_detectors-master\images
I didn’t know how to load them in “Predict” section before the following code _, inputs, images, data = gt_util.sample_random_batch(batch_size=32, input_size=model.image_size)
How can I link to those images directory?
Tune Kamae
Okay, what you are looking for is probably in SL_predict.ipynb
under 'Real world images', but with SSD Model and PriorUtility
.
For training with your own dataset, you should write a custom parser (GTUtility
), like it is done in data_voc.py
.
Markus
Thank you for the email.
I am running SSD_predict.ipynb to test its prediction power.
I don’t know yet how to upload images to the “Predict” section of the notebook.
Tune Kamae
import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import cv2
from ssd_model import SSD300, SSD512
from ssd_utils import PriorUtil
from ssd_data import preprocess
from utils.model import load_weights
%matplotlib inline
# MS COCO
from data_coco import GTUtility
gt_util = GTUtility('./data/COCO/', validation=True)
# SDD512
model = SSD512(num_classes=gt_util.num_classes)
weights_path = './models/ssd512_coco_weights_fixed.hdf5'; confidence_threshold = 0.7
load_weights(model, weights_path)
prior_util = PriorUtil(model)
# predict
inputs = []
images = []
img_paths = glob.glob('./data/images/*.jpg')
for img_path in img_paths:
img = cv2.imread(img_path)
inputs.append(preprocess(img, model.image_size))
h, w = model.image_size
img = cv2.resize(img, (w,h), cv2.INTER_LINEAR).astype('float32')
img = img[:, :, (2,1,0)] # BGR to RGB
img /= 255
images.append(img)
inputs = np.asarray(inputs)
preds = model.predict(inputs, batch_size=1, verbose=1)
for i in range(len(images)):
print(img_paths[i])
plt.figure(figsize=[8]*2, frameon=True)
plt.imshow(images[i])
res = prior_util.decode(preds[i], confidence_threshold=0.5)
prior_util.plot_results(res, classes=gt_util.classes)
plt.axis('off')
plt.show()
The converted caffe models may require fine tuning and the threshold was chosen more or less ad hoc.
Markus
Thank you for the advice.
I copied “Real World Images” and managed to run almost to end. What I could figure out is where data[i] in prior_util.plot_results(res, classes=gt_util.classes, show_labels=True, gt_data=data[i])
data[i] probably has the bounding boxes etc.
Tune Kamae
prior_util.plot_results(res, classes=gt_util.classes, show_labels=True)
Markus
Millions of thanks! Yes the program ran through and recognized objects!
I will try to follow other notebooks too.
Your End-to-End samples are so valuable!
Regards,
Tune Kamae
Markus and collaborators
Did someone write SSD-MobileNetV2 structure in keras?
I am trying to train COCO 2017 dataset on SSD-MobileNetV2.
Appreciate any advice or instructions,
Tune Kamae
I tried MobileNet V1, but I'm not sure if it is working...
from keras.models import Model
from keras.applications import MobileNet
from keras.layers import Activation
from keras.layers import Conv2D
from keras.layers import SeparableConv2D
from keras.layers import BatchNormalization
def ssd300_mobilenet_body(x):
source_layers = []
mobilenet = MobileNet(input_shape=(224,224,3), include_top=False, weights='imagenet')
x = Model(inputs=mobilenet.input, outputs=mobilenet.get_layer('conv_dw_11_relu').output)(x)
x = Conv2D(512, (1, 1), padding='same', name='conv11')(x)
x = BatchNormalization(name='bn11')(x)
x = Activation('relu')(x)
source_layers.append(x)
x = SeparableConv2D(512, (3, 3),strides=(2, 2), padding='same', name='conv12dw')(x)
x = BatchNormalization(name='bn12dw')(x)
x = Activation('relu')(x)
x = Conv2D(1024, (1, 1), padding='same', name='conv12')(x)
x = BatchNormalization(name='bn12')(x)
x = Activation('relu')(x)
x = SeparableConv2D(1024, (3, 3), padding='same',name='conv13dw')(x)
x = BatchNormalization(name='bn13dw')(x)
x = Activation('relu')(x)
x = Conv2D(1024, (1, 1), padding='same', name='conv13')(x)
x = BatchNormalization(name='bn13')(x)
x = Activation('relu')(x)
source_layers.append(x)
x = Conv2D(256, (1, 1), padding='same', name='conv14_1')(x)
x = BatchNormalization(name='bn14_1')(x)
x = Activation('relu')(x)
x = Conv2D(512, (3, 3), strides=(2, 2), padding='same', name='conv14_2')(x)
x = BatchNormalization(name='bn14_2')(x)
x = Activation('relu')(x)
source_layers.append(x)
x = Conv2D(128, (1, 1), padding='same', name='conv15_1')(x)
x = BatchNormalization(name='bn15_1')(x)
x = Activation('relu')(x)
x = Conv2D(256, (3, 3), strides=(2, 2), padding='same', name='conv15_2')(x)
x = BatchNormalization(name='bn15_2')(x)
x = Activation('relu')(x)
source_layers.append(x)
x = Conv2D(128, (1, 1), padding='same', name='conv16_1')(x)
x = BatchNormalization(name='bn16_1')(x)
x = Activation('relu')(x)
x = Conv2D(256, (3, 3), strides=(2, 2), padding='same', name='conv16_2')(x)
x = BatchNormalization(name='bn16_2')(x)
x = Activation('relu')(x)
source_layers.append(x)
x = Conv2D(64, (1, 1), padding='same', name='conv17_1')(x)
x = BatchNormalization(name='bn17_1')(x)
x = Activation('relu')(x)
x = Conv2D(128, (3, 3), strides=(2, 2), padding='same', name='conv17_2')(x)
x = BatchNormalization(name='bn17_2')(x)
x = Activation('relu')(x)
source_layers.append(x)
return source_layers
def SSD300_mobile(input_shape=(300, 300, 3), num_classes=21, softmax=True):
"""SSD300 with MobileNet architecture.
Based on the Keras implementationo of MobileNet.
# References
https://arxiv.org/abs/1704.04861
"""
x = input_tensor = Input(shape=input_shape)
source_layers = ssd300_mobilenet_body(x)
num_priors = [4, 6, 6, 6, 4, 4]
normalizations = [20, 20, 20, 20, 20, 20]
output_tensor = multibox_head(source_layers, num_priors, num_classes, normalizations, softmax)
model = Model(input_tensor, output_tensor)
model.num_classes = num_classes
# parameters for prior boxes
model.image_size = input_shape[:2]
model.source_layers = source_layers
model.aspect_ratios = [[1,2,1/2], [1,2,1/2,3,1/3], [1,2,1/2,3,1/3], [1,2,1/2,3,1/3], [1,2,1/2], [1,2,1/2]]
model.minmax_sizes = [(30, 60), (60, 111), (111, 162), (162, 213), (213, 264), (264, 315)]
model.steps = [8, 16, 32, 64, 100, 300]
model.special_ssd_boxes = True
return model
If you get SSD running with MobileNet V2, I would appreciate if you could share your findings.
Markus,
May I bother you again?
I am trying to understand your code and reading your Thesis.
I examined your ssd512_coco_weights_fixed.hdf5 using HDFview-3.5 (for Win10) And compared with what are in ssd512_body(x): in ssd_model.py, as well as Fig.3.5 of your thesis.
What I have to understand is: ssd_model.py
x=Conv2D(64, 3 ,strides=1, padding=’same’, name=’conv1_1’,activation=’relu’)(x) x=Conv2D(64, 3 ,strides=1, padding=’same’, name=’conv1_2’,activation=’relu’)(x) x=MaxPool2D(pool_size=2, strides=2, padding=’same’, name=’pool1’)(x)
x=Conv2D(128, 3 ,strides=1, padding=’same’, name=’conv2_1’,activation=’relu’)(x) x=Conv2D(128, 3 ,strides=1, padding=’same’, name=’conv2_2’,activation=’relu’)(x) x=MaxPool2D(pool_size=2, strides=2, padding=’same’, name=’pool2’)(x)
HDFview conv1_1 Weights 3x3x3x64 relu 64 conv1_2 Weights 3x3x64x64 relu 64 Question 1: depth changed from 3(rgb?) to 64 but not explicitly written in ssd_model.py # Block 1 Max pool 2D Conv2_1 Weights 3x3x64x128 relu 128 Conv2_2 Weights 3x3x128x128 relu 128 Question 2: Again where in ssd_model.py does this change from 64 to 128 specified?
In Fig.3.5 these dimensions do not match. Why?
Another bigger challenge for me to find out where the branch 38x38x512 (multi-box?) and other branches to follow are specified in your code?
Apology for asking to many questions.
Thanking you in advance,
Tune Kamae
conv1_1 Weights 3x3x3x64 relu 64 conv1_2 Weights 3x3x64x64 relu 64 Question 1: depth changed from 3(rgb?) to 64 but not explicitly written in ssd_model.py
Yes, the weights have always shape (kernel_size, kernel_size, input_channels, output_channels)
. 3 is the number of input channels (BGR) defined in SSD512.
The missing Conv2_1
and Conv2_2
layers in Fig. 3.5 are my mistake...
The tensors at the branching point are collected in source_layers
. multibox_head
adds the prediction paths.
Markus
I am now beginning to reproduce your SL_predict.ipynb is set to use SynthText data set as the default. The dataset is too big for me (41GB). I could not download.
I would guess I don’t need the data-set if I use your pre-trained /201809231008_sl512_synthtext/weights.002.h5.
Am I right?
Another question is how to modify so that I can use use Total-Text (Ch’ng and Chan 2017)? That dataset seems to be closer to what I need for the blind people.
Thank you for assistance.
Tune Kamae
Am I right?
Yes
SegLink is actually not intended for the detection of curved text instances. Curved text would require custom encoding and decoding procedure, as well as another representation in the GTUtility
and rectification before the recognition stage. It should also work to just write a new decoder and use it with the SynthText models, but I do not have the time for implementing this. arXiv:1807.01544 is probably the approach that comes closest to this idea.
If you just need a custom parser for a dataset with oriented bounding boxes, #12...
Markus
I am trying to run SL_predict.ipynb. I ran into an error early on.
Below are two snapshots of my screen. Do you see any problem the way I am running?
It seems that 'data/SynthText/gt.mat' is needed.
Thank you again for your kind help.
Tune Kamae
I have no idea how to access the email attachments from the github issues, but I hope you find the answer to your question in #1 or #8.
Markus
Thank you for your assistance.
I managed to run SL_end2end_predict.ipynb (SL512,) and find texts in photo images.
One error I got is: words = crop_words(img, np.clip(boxes/512,0,1), input_height, width=input_width, grayscale=True) NameError: name 'input_height' is not defined
What value do you recommend for 'input_height' i?
Eventually I would like to use video or phone-camera inputs.
Tune Kamae
Markus
Thank you. Now I can detect and recognize roman letters using SL_end2end_predict.ipynb. I am trying to figure out a way to save tiny image boxes including characters and send them to Google cloud service to recognize non-roman letters.
Do you have any suggestion?
Tune Kamae
ssd_detectors-master\ssd_data.py in preprocess(img, size) 628 img = img.astype(np.float32) 629 mean = np.array([104,117,123]) --> 630 img -= mean[np.newaxis, np.newaxis, :] 631 return img 632 ValueError: operands could not be broadcast together with shapes (512,512) (1,1,3) (512,512)