mvoelk / ssd_detectors

SSD-based object and text detection with Keras, SSD, DSOD, TextBoxes, SegLink, TextBoxes++, CRNN
MIT License
302 stars 85 forks source link

Error in gt_util.sample_random_batch(batch_size=32, input_size=model.image_size) #21

Open kamae opened 5 years ago

kamae commented 5 years ago

ssd_detectors-master\ssd_data.py in preprocess(img, size) 628 img = img.astype(np.float32) 629 mean = np.array([104,117,123]) --> 630 img -= mean[np.newaxis, np.newaxis, :] 631 return img 632 ValueError: operands could not be broadcast together with shapes (512,512) (1,1,3) (512,512)

mvoelk commented 5 years ago

I guess your image data is grayscale with shape (512,512) or (512,512,1). I always used RGB images (e.g. shape (512,512,3)) and hard coded the channel means for compatibility with caffe models.

kamae commented 5 years ago

Markus

Thank you again for your kind assistance.

I was not loading images to predict. I want to use the ones in ssd_detectors-master\data\images They are boys.jpg, cafr_cat.jpg, fish-bike.jpg.

Or those in ssd_detectors-master\images

I didn’t know how to load them in “Predict” section before the following code _, inputs, images, data = gt_util.sample_random_batch(batch_size=32, input_size=model.image_size)

How can I link to those images directory?

Tune Kamae

mvoelk commented 5 years ago

Okay, what you are looking for is probably in SL_predict.ipynb under 'Real world images', but with SSD Model and PriorUtility.

For training with your own dataset, you should write a custom parser (GTUtility), like it is done in data_voc.py.

kamae commented 5 years ago

Markus

Thank you for the email.

I am running SSD_predict.ipynb to test its prediction power.

I don’t know yet how to upload images to the “Predict” section of the notebook.

Tune Kamae

mvoelk commented 5 years ago
import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import cv2

from ssd_model import SSD300, SSD512
from ssd_utils import PriorUtil
from ssd_data import preprocess
from utils.model import load_weights

%matplotlib inline

# MS COCO
from data_coco import GTUtility
gt_util = GTUtility('./data/COCO/', validation=True)

# SDD512
model = SSD512(num_classes=gt_util.num_classes)
weights_path = './models/ssd512_coco_weights_fixed.hdf5'; confidence_threshold = 0.7

load_weights(model, weights_path)
prior_util = PriorUtil(model)

# predict 
inputs = []
images = []

img_paths = glob.glob('./data/images/*.jpg')

for img_path in img_paths:
    img = cv2.imread(img_path)
    inputs.append(preprocess(img, model.image_size))
    h, w = model.image_size
    img = cv2.resize(img, (w,h), cv2.INTER_LINEAR).astype('float32')
    img = img[:, :, (2,1,0)] # BGR to RGB
    img /= 255
    images.append(img)

inputs = np.asarray(inputs)

preds = model.predict(inputs, batch_size=1, verbose=1)

for i in range(len(images)):
    print(img_paths[i])
    plt.figure(figsize=[8]*2, frameon=True)
    plt.imshow(images[i])
    res = prior_util.decode(preds[i], confidence_threshold=0.5)
    prior_util.plot_results(res, classes=gt_util.classes)
    plt.axis('off')
    plt.show()

The converted caffe models may require fine tuning and the threshold was chosen more or less ad hoc.

kamae commented 5 years ago

Markus

Thank you for the advice.

I copied “Real World Images” and managed to run almost to end. What I could figure out is where data[i] in prior_util.plot_results(res, classes=gt_util.classes, show_labels=True, gt_data=data[i])

data[i] probably has the bounding boxes etc.

Tune Kamae

mvoelk commented 5 years ago

prior_util.plot_results(res, classes=gt_util.classes, show_labels=True)

kamae commented 5 years ago

Markus

Millions of thanks! Yes the program ran through and recognized objects!

I will try to follow other notebooks too.

Your End-to-End samples are so valuable!

Regards,

Tune Kamae

kamae commented 5 years ago

Markus and collaborators

Did someone write SSD-MobileNetV2 structure in keras?

I am trying to train COCO 2017 dataset on SSD-MobileNetV2.

Appreciate any advice or instructions,

Tune Kamae

mvoelk commented 5 years ago

I tried MobileNet V1, but I'm not sure if it is working...

from keras.models import Model
from keras.applications import MobileNet
from keras.layers import Activation
from keras.layers import Conv2D
from keras.layers import SeparableConv2D
from keras.layers import BatchNormalization

def ssd300_mobilenet_body(x):

    source_layers = []

    mobilenet = MobileNet(input_shape=(224,224,3), include_top=False, weights='imagenet')
    x = Model(inputs=mobilenet.input, outputs=mobilenet.get_layer('conv_dw_11_relu').output)(x)

    x = Conv2D(512, (1, 1), padding='same', name='conv11')(x)
    x = BatchNormalization(name='bn11')(x)
    x = Activation('relu')(x)
    source_layers.append(x)

    x = SeparableConv2D(512, (3, 3),strides=(2, 2), padding='same', name='conv12dw')(x)
    x = BatchNormalization(name='bn12dw')(x)
    x = Activation('relu')(x)
    x = Conv2D(1024, (1, 1), padding='same', name='conv12')(x)
    x = BatchNormalization(name='bn12')(x)
    x = Activation('relu')(x)
    x = SeparableConv2D(1024, (3, 3), padding='same',name='conv13dw')(x)
    x = BatchNormalization(name='bn13dw')(x)
    x = Activation('relu')(x)
    x = Conv2D(1024, (1, 1), padding='same', name='conv13')(x)
    x = BatchNormalization(name='bn13')(x)
    x = Activation('relu')(x)
    source_layers.append(x)

    x = Conv2D(256, (1, 1), padding='same', name='conv14_1')(x)
    x = BatchNormalization(name='bn14_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(512, (3, 3), strides=(2, 2), padding='same', name='conv14_2')(x)
    x = BatchNormalization(name='bn14_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)

    x = Conv2D(128, (1, 1), padding='same', name='conv15_1')(x)
    x = BatchNormalization(name='bn15_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(256, (3, 3), strides=(2, 2), padding='same', name='conv15_2')(x)
    x = BatchNormalization(name='bn15_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)

    x = Conv2D(128, (1, 1), padding='same', name='conv16_1')(x)
    x = BatchNormalization(name='bn16_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(256, (3, 3), strides=(2, 2), padding='same', name='conv16_2')(x)
    x = BatchNormalization(name='bn16_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)

    x = Conv2D(64, (1, 1), padding='same', name='conv17_1')(x)
    x = BatchNormalization(name='bn17_1')(x)
    x = Activation('relu')(x)
    x = Conv2D(128, (3, 3), strides=(2, 2), padding='same', name='conv17_2')(x)
    x = BatchNormalization(name='bn17_2')(x)
    x = Activation('relu')(x)
    source_layers.append(x)

    return source_layers

def SSD300_mobile(input_shape=(300, 300, 3), num_classes=21, softmax=True):
    """SSD300 with MobileNet architecture.

    Based on the Keras implementationo of MobileNet.

    # References
        https://arxiv.org/abs/1704.04861
    """

    x = input_tensor = Input(shape=input_shape)
    source_layers = ssd300_mobilenet_body(x)

    num_priors = [4, 6, 6, 6, 4, 4]
    normalizations = [20, 20, 20, 20, 20, 20]

    output_tensor = multibox_head(source_layers, num_priors, num_classes, normalizations, softmax)
    model = Model(input_tensor, output_tensor)
    model.num_classes = num_classes

    # parameters for prior boxes
    model.image_size = input_shape[:2]
    model.source_layers = source_layers
    model.aspect_ratios = [[1,2,1/2], [1,2,1/2,3,1/3], [1,2,1/2,3,1/3], [1,2,1/2,3,1/3], [1,2,1/2], [1,2,1/2]]
    model.minmax_sizes = [(30, 60), (60, 111), (111, 162), (162, 213), (213, 264), (264, 315)]
    model.steps = [8, 16, 32, 64, 100, 300]
    model.special_ssd_boxes = True

    return model

If you get SSD running with MobileNet V2, I would appreciate if you could share your findings.

kamae commented 5 years ago

Markus,

May I bother you again?

I am trying to understand your code and reading your Thesis.

I examined your ssd512_coco_weights_fixed.hdf5 using HDFview-3.5 (for Win10) And compared with what are in ssd512_body(x): in ssd_model.py, as well as Fig.3.5 of your thesis.

What I have to understand is: ssd_model.py

Block 1

x=Conv2D(64, 3 ,strides=1, padding=’same’, name=’conv1_1’,activation=’relu’)(x) x=Conv2D(64, 3 ,strides=1, padding=’same’, name=’conv1_2’,activation=’relu’)(x) x=MaxPool2D(pool_size=2, strides=2, padding=’same’, name=’pool1’)(x)

Block 2

x=Conv2D(128, 3 ,strides=1, padding=’same’, name=’conv2_1’,activation=’relu’)(x) x=Conv2D(128, 3 ,strides=1, padding=’same’, name=’conv2_2’,activation=’relu’)(x) x=MaxPool2D(pool_size=2, strides=2, padding=’same’, name=’pool2’)(x)

HDFview conv1_1 Weights 3x3x3x64 relu 64 conv1_2 Weights 3x3x64x64 relu 64 Question 1: depth changed from 3(rgb?) to 64 but not explicitly written in ssd_model.py # Block 1 Max pool 2D Conv2_1 Weights 3x3x64x128 relu 128 Conv2_2 Weights 3x3x128x128 relu 128 Question 2: Again where in ssd_model.py does this change from 64 to 128 specified?

In Fig.3.5 these dimensions do not match. Why?

Another bigger challenge for me to find out where the branch 38x38x512 (multi-box?) and other branches to follow are specified in your code?

Apology for asking to many questions.

Thanking you in advance,

Tune Kamae

mvoelk commented 5 years ago

conv1_1 Weights 3x3x3x64 relu 64 conv1_2 Weights 3x3x64x64 relu 64 Question 1: depth changed from 3(rgb?) to 64 but not explicitly written in ssd_model.py

Yes, the weights have always shape (kernel_size, kernel_size, input_channels, output_channels). 3 is the number of input channels (BGR) defined in SSD512.

The missing Conv2_1 and Conv2_2 layers in Fig. 3.5 are my mistake...

mvoelk commented 5 years ago

The tensors at the branching point are collected in source_layers. multibox_head adds the prediction paths.

kamae commented 5 years ago

Markus

I am now beginning to reproduce your SL_predict.ipynb is set to use SynthText data set as the default. The dataset is too big for me (41GB). I could not download.

I would guess I don’t need the data-set if I use your pre-trained /201809231008_sl512_synthtext/weights.002.h5.

Am I right?

Another question is how to modify so that I can use use Total-Text (Ch’ng and Chan 2017)? That dataset seems to be closer to what I need for the blind people.

Thank you for assistance.

Tune Kamae

mvoelk commented 5 years ago

Am I right?

Yes

SegLink is actually not intended for the detection of curved text instances. Curved text would require custom encoding and decoding procedure, as well as another representation in the GTUtility and rectification before the recognition stage. It should also work to just write a new decoder and use it with the SynthText models, but I do not have the time for implementing this. arXiv:1807.01544 is probably the approach that comes closest to this idea.

If you just need a custom parser for a dataset with oriented bounding boxes, #12...

kamae commented 5 years ago

Markus

I am trying to run SL_predict.ipynb. I ran into an error early on.

Below are two snapshots of my screen. Do you see any problem the way I am running?

It seems that 'data/SynthText/gt.mat' is needed.

Thank you again for your kind help.

Tune Kamae

mvoelk commented 5 years ago

I have no idea how to access the email attachments from the github issues, but I hope you find the answer to your question in #1 or #8.

kamae commented 5 years ago

Markus

Thank you for your assistance.

I managed to run SL_end2end_predict.ipynb (SL512,) and find texts in photo images.

One error I got is: words = crop_words(img, np.clip(boxes/512,0,1), input_height, width=input_width, grayscale=True) NameError: name 'input_height' is not defined

What value do you recommend for 'input_height' i?

Eventually I would like to use video or phone-camera inputs.

Tune Kamae

mvoelk commented 5 years ago

https://github.com/mvoelk/ssd_detectors/blob/df709800eca5a0f56986dbaa5c0a0e451943bd54/SL_end2end_predict.ipynb#L109

kamae commented 5 years ago

Markus

Thank you. Now I can detect and recognize roman letters using SL_end2end_predict.ipynb. I am trying to figure out a way to save tiny image boxes including characters and send them to Google cloud service to recognize non-roman letters.

Do you have any suggestion?

Tune Kamae