praisethemoon commented 5 years ago

Hello, I am glad to see that a fork of this project is still alive! Cheers for that.

I have a question regarding the structure of the network, do you think it is possible to train a RPN only to predict regions (I do not need classification as I am going to perform it with SVM) where for each proposed region, I would also need its feature map.

PS. If you have any idea how to get the feature map of a particular region in the input image I would be grateful.

Thanks!

kentaroy47 commented 5 years ago

@praisethemoon I think if you stop the training at step.1, it is possible to train RPN only. After you got the weights for RPN, you can connect your SVM and train the classifier.

python train_rpn.py --network mobilenetv1 -o simple -p /path/to/your/dataset/

note that these extract the particular region of the image.

# in https://github.com/kentaroy47/frcnn-from-scratch-with-keras/blob/master/test_frcnn.py
# lines 178-192
    for jk in range(R.shape[0]//C.num_rois + 1):
        ROIs = np.expand_dims(R[C.num_rois*jk:C.num_rois*(jk+1), :], axis=0)
        if ROIs.shape[1] == 0:
            break

        if jk == R.shape[0]//C.num_rois:
            #pad R
            curr_shape = ROIs.shape
            target_shape = (curr_shape[0],C.num_rois,curr_shape[2])
            ROIs_padded = np.zeros(target_shape).astype(ROIs.dtype)
            ROIs_padded[:, :curr_shape[1], :] = ROIs
            ROIs_padded[0, curr_shape[1]:, :] = ROIs[0, 0, :]
            ROIs = ROIs_padded

        [P_cls, P_regr] = model_classifier_only.predict([F, ROIs])

you can use [F, ROIs] for your SVM.

ayushmungad commented 5 years ago

Hello @kentaroy47 I really appreciate your efforts to help us understand how exactly RPN works. But, to visualize the results of the region proposals, we will need to modify "test_frcnn.py" as well. For doing the same, I had done a few changes in the code but couldn't really end up with the results I was looking for. Here's the code snippet of changes I have made:

R = roi_helpers.rpn_to_roi(Y1, Y2, C, K.image_dim_ordering(), overlap_thresh=0.7)

convert from (x1,y1,x2,y2) to (x,y,w,h)

R[:, 2] -= R[:, 0]
R[:, 3] -= R[:, 1]

# apply the spatial pyramid pooling to the proposed regions
bboxes = []

for jk in range(R.shape[0]//C.num_rois + 1):
    ROIs = np.expand_dims(R[C.num_rois*jk:C.num_rois*(jk+1), :], axis=0)
    if ROIs.shape[1] == 0:
        break

    if jk == R.shape[0]//C.num_rois:
        curr_shape = ROIs.shape
        target_shape = (curr_shape[0],C.num_rois,curr_shape[2])
        ROIs_padded = np.zeros(target_shape).astype(ROIs.dtype)
        ROIs_padded[:, :curr_shape[1], :] = ROIs
        ROIs_padded[0, curr_shape[1]:, :] = ROIs[0, 0, :]
        ROIs = ROIs_padded

    #[P_cls, P_regr] = model_classifier_only.predict([F, ROIs]) 

    for ii in range(ROIs.shape[1]): 
        (x, y, w, h) = ROIs[0, ii, :]
                # making a list instead of dict as there are no classes
        bboxes.append([C.rpn_stride*x, C.rpn_stride*y, C.rpn_stride*(x+w), C.rpn_stride*(y+h)])

    bbox = np.array(bboxes)
new_boxes = roi_helpers.non_max_suppression_fast1(bbox, overlapThresh=0.5)
for jk in range(new_boxes.shape[0]):
    (x1, y1, x2, y2) = new_boxes[jk,:]

    (real_x1, real_y1, real_x2, real_y2) = get_real_coordinates(ratio, x1, y1, x2, y2)
    cv2.rectangle(img,(real_x1, real_y1), (real_x2, real_y2), (0,255,0),2)

    ############# same code as test_frcnn.py after this ``

Could you please guide me about how do I visualize the ROIs as you have pointed out, before giving them to the SVM classifier?

These are the region proposals I have obtained on a dataset of Blood Cell detection using your "train_rpn.py" code:

kentaroy47 commented 5 years ago

Hi @ayushmungad

Thanks for trying out this script and fixing bugs.

You might want to try non-maximum suppression (NMS), so that the over-lapping bounding box can be suppressed. Then, you can feed the NMSed outputs to your classifier.

It is used like this in train_frcnn.py:

R = roi_helpers.rpn_to_roi(P_rpn[0], P_rpn[1], C, K.image_dim_ordering(), use_regr=True, overlap_thresh=0.7, max_boxes=300)

What NMS does is explained here: https://www.quora.com/How-does-non-maximum-suppression-work-in-object-detection

You should read it to get the idea of what this part does. Here are the codes in the util script:

Please let me know if you have other problems!

Ken

# 
def rpn_to_roi(rpn_layer, regr_layer, C, dim_ordering, use_regr=True, max_boxes=300,overlap_thresh=0.9):

    regr_layer = regr_layer / C.std_scaling

    anchor_sizes = C.anchor_box_scales
    anchor_ratios = C.anchor_box_ratios

    assert rpn_layer.shape[0] == 1

    if dim_ordering == 'th':
        (rows,cols) = rpn_layer.shape[2:]

    elif dim_ordering == 'tf':
        (rows, cols) = rpn_layer.shape[1:3]

    curr_layer = 0
    if dim_ordering == 'tf':
        A = np.zeros((4, rpn_layer.shape[1], rpn_layer.shape[2], rpn_layer.shape[3]))
    elif dim_ordering == 'th':
        A = np.zeros((4, rpn_layer.shape[2], rpn_layer.shape[3], rpn_layer.shape[1]))

    for anchor_size in anchor_sizes:
        for anchor_ratio in anchor_ratios:

            anchor_x = (anchor_size * anchor_ratio[0])/C.rpn_stride
            anchor_y = (anchor_size * anchor_ratio[1])/C.rpn_stride
            if dim_ordering == 'th':
                regr = regr_layer[0, 4 * curr_layer:4 * curr_layer + 4, :, :]
            else:
                regr = regr_layer[0, :, :, 4 * curr_layer:4 * curr_layer + 4]
                regr = np.transpose(regr, (2, 0, 1))

            X, Y = np.meshgrid(np.arange(cols),np. arange(rows))

            A[0, :, :, curr_layer] = X - anchor_x/2
            A[1, :, :, curr_layer] = Y - anchor_y/2
            A[2, :, :, curr_layer] = anchor_x
            A[3, :, :, curr_layer] = anchor_y

            if use_regr:
                A[:, :, :, curr_layer] = apply_regr_np(A[:, :, :, curr_layer], regr)

            A[2, :, :, curr_layer] = np.maximum(1, A[2, :, :, curr_layer])
            A[3, :, :, curr_layer] = np.maximum(1, A[3, :, :, curr_layer])
            A[2, :, :, curr_layer] += A[0, :, :, curr_layer]
            A[3, :, :, curr_layer] += A[1, :, :, curr_layer]

            A[0, :, :, curr_layer] = np.maximum(0, A[0, :, :, curr_layer])
            A[1, :, :, curr_layer] = np.maximum(0, A[1, :, :, curr_layer])
            A[2, :, :, curr_layer] = np.minimum(cols-1, A[2, :, :, curr_layer])
            A[3, :, :, curr_layer] = np.minimum(rows-1, A[3, :, :, curr_layer])

            curr_layer += 1

    all_boxes = np.reshape(A.transpose((0, 3, 1,2)), (4, -1)).transpose((1, 0))
    all_probs = rpn_layer.transpose((0, 3, 1, 2)).reshape((-1))

    x1 = all_boxes[:, 0]
    y1 = all_boxes[:, 1]
    x2 = all_boxes[:, 2]
    y2 = all_boxes[:, 3]

    idxs = np.where((x1 - x2 >= 0) | (y1 - y2 >= 0))

    all_boxes = np.delete(all_boxes, idxs, 0)
    all_probs = np.delete(all_probs, idxs, 0)

    result = non_max_suppression_fast(all_boxes, all_probs, overlap_thresh=overlap_thresh, max_boxes=max_boxes)[0]

    return result

ayushmungad commented 5 years ago

Hello @kentaroy47, Thank you for your timely response.

Actually, I have applied non-max suppression after obtaining the region proposals as in this line in my modified code:

new_boxes = roi_helpers.non_max_suppression_fast1(bbox, overlapThresh=0.5)

Here, I have modified the NMS function, as now after training only on the RPN part, we do not have the probabilities i.e P_cls and P_regr that we obtain from the classifier layer prediction output. Hence, I have changed the NMS code to sort the bounding boxes by the right-most co-ordinate instead of sorting by probabilities that is done in the original NMS code in the utils.

But, there is a drastic difference between the results obtained by the proposals obtained from just RPN layer and the results using both the layers.

So, is this approach of modifying NMS without taking into consideration the probabilities of bounding box classes as in P_cls and P_regr correct?

Here is the modified NMS code:

`def non_max_suppression_fast1(boxes, overlapThresh):

if there are no boxes, return an empty list

if len(boxes) == 0:
    return [ ]

# if the bounding boxes integers, convert them to floats --
# this is important since we'll be doing a bunch of divisions
if boxes.dtype.kind == "i":
    boxes = boxes.astype("float")

# initialize the list of picked indexes 
pick = []

# grab the coordinates of the bounding boxes
x1 = boxes[:,0]
y1 = boxes[:,1]
x2 = boxes[:,2]
y2 = boxes[:,3]

# compute the area of the bounding boxes and sort the bounding
# boxes by the bottom-right y-coordinate of the bounding box
area = (x2 - x1 + 1) * (y2 - y1 + 1)
idxs = np.argsort(y2)

# keep looping while some indexes still remain in the indexes
# list
while len(idxs) > 0:
    # grab the last index in the indexes list and add the
    # index value to the list of picked indexes
    last = len(idxs) - 1
    i = idxs[last]
    pick.append(i)

    # find the largest (x, y) coordinates for the start of
    # the bounding box and the smallest (x, y) coordinates
    # for the end of the bounding box
    xx1 = np.maximum(x1[i], x1[idxs[:last]])
    yy1 = np.maximum(y1[i], y1[idxs[:last]])
    xx2 = np.minimum(x2[i], x2[idxs[:last]])
    yy2 = np.minimum(y2[i], y2[idxs[:last]])

    # compute the width and height of the bounding box
    w = np.maximum(0, xx2 - xx1 + 1)
    h = np.maximum(0, yy2 - yy1 + 1)

    # compute the ratio of overlap
    overlap = (w * h) / area[idxs[:last]]

    # delete all indexes from the index list that have
    idxs = np.delete(idxs, np.concatenate(([last],
        np.where(overlap > overlapThresh)[0])))

# return only the bounding boxes that were picked using the
# integer data type
return boxes[pick].astype("int")

` Here are the results of the region proposals obtained from only the RPN part(with max_boxes=20) and the RPN+classfier layer.

imgonline-com-ua-twotoone-EEDMt0FGsRNOPfJu Left side= RPN+classifier output. Right side= Only RPN proposals.

kentaroy47 commented 5 years ago

Hi @ayushmungad. Great work done, and good luck on your projects :) Hope I can be your help..

So, is this approach of modifying NMS without taking into consideration the probabilities of bounding box classes as in P_cls and P_regr correct?

In RPN, both P_cls and P_regr can be obtained by running RPN prediction.

P_rpn = model_rpn.predict_on_batch(image)

P_cls is the probability either the box is an object or background. Higher the P_cls, the bindbox is likely to contain an object.
P_regr is the bindingbox regression, which does a "fine" adjustments of the bindingbox in x and y direction.

P_cls should be considered in the NMS process, since it will try to pick the most confident bbox prediction (highest P_cls) and will remove the overlapping bindingboxes near it by calculating the IoU. You should include P_regr if you want precise bindingbox placements but can run without it.. (depends on your image resolution)

Hence, I have changed the NMS code to sort the bounding boxes by the right-most co-ordinate instead of sorting by probabilities that is done in the original NMS code in the utils.

sorry, I was not sure what you mean by this.. can you explain further?

praisethemoon commented 5 years ago

@ayushmungad To be honest, if I were you I would consider Mask R-CNN, it feels like your objects are very similar yet cannot be easily detected as Bounding Boxes, maybe Mask would help? It's just an opinion :) have a look at this : https://github.com/matterport/Mask_RCNN and good luck!

@kentaroy47 Thank you for your reply, I have just checked it, would mind explaining the structure of F and ROIs? I've run a simple test to debug their shape and it is as follows along with my assumptions:

F: (1, 32, 40, 1024), I assume that 1024 is the number of features, 32 is the number of ROIs I take it and 1 is the batch size. But what is 40?
ROIs: (1, 32, 4), 1: Batch Size, 32 ROIs, 4: (x, y, w, h) coords which should be transformed with get_real_coordinates.

Thanks!

praisethemoon commented 5 years ago

Oh and to give you an idea about what I am trying do, I have trained two networks of Faster R-CNN on multispectral images, that is visible and thermal images (1 network for each modality) and I need to extract features of the same detected objects (IoU > 0.5) to classify them with SVM.

kentaroy47 / frcnn-from-scratch-with-keras

Use RPN only for proposals and classify features with SVM #1

convert from (x1,y1,x2,y2) to (x,y,w,h)

if there are no boxes, return an empty list