Open praisethemoon opened 5 years ago
@praisethemoon I think if you stop the training at step.1, it is possible to train RPN only. After you got the weights for RPN, you can connect your SVM and train the classifier.
python train_rpn.py --network mobilenetv1 -o simple -p /path/to/your/dataset/
note that these extract the particular region of the image.
# in https://github.com/kentaroy47/frcnn-from-scratch-with-keras/blob/master/test_frcnn.py
# lines 178-192
for jk in range(R.shape[0]//C.num_rois + 1):
ROIs = np.expand_dims(R[C.num_rois*jk:C.num_rois*(jk+1), :], axis=0)
if ROIs.shape[1] == 0:
break
if jk == R.shape[0]//C.num_rois:
#pad R
curr_shape = ROIs.shape
target_shape = (curr_shape[0],C.num_rois,curr_shape[2])
ROIs_padded = np.zeros(target_shape).astype(ROIs.dtype)
ROIs_padded[:, :curr_shape[1], :] = ROIs
ROIs_padded[0, curr_shape[1]:, :] = ROIs[0, 0, :]
ROIs = ROIs_padded
[P_cls, P_regr] = model_classifier_only.predict([F, ROIs])
you can use [F, ROIs] for your SVM.
Hello @kentaroy47 I really appreciate your efforts to help us understand how exactly RPN works. But, to visualize the results of the region proposals, we will need to modify "test_frcnn.py" as well. For doing the same, I had done a few changes in the code but couldn't really end up with the results I was looking for. Here's the code snippet of changes I have made:
R = roi_helpers.rpn_to_roi(Y1, Y2, C, K.image_dim_ordering(), overlap_thresh=0.7)
R[:, 2] -= R[:, 0]
R[:, 3] -= R[:, 1]
# apply the spatial pyramid pooling to the proposed regions
bboxes = []
for jk in range(R.shape[0]//C.num_rois + 1):
ROIs = np.expand_dims(R[C.num_rois*jk:C.num_rois*(jk+1), :], axis=0)
if ROIs.shape[1] == 0:
break
if jk == R.shape[0]//C.num_rois:
curr_shape = ROIs.shape
target_shape = (curr_shape[0],C.num_rois,curr_shape[2])
ROIs_padded = np.zeros(target_shape).astype(ROIs.dtype)
ROIs_padded[:, :curr_shape[1], :] = ROIs
ROIs_padded[0, curr_shape[1]:, :] = ROIs[0, 0, :]
ROIs = ROIs_padded
#[P_cls, P_regr] = model_classifier_only.predict([F, ROIs])
for ii in range(ROIs.shape[1]):
(x, y, w, h) = ROIs[0, ii, :]
# making a list instead of dict as there are no classes
bboxes.append([C.rpn_stride*x, C.rpn_stride*y, C.rpn_stride*(x+w), C.rpn_stride*(y+h)])
bbox = np.array(bboxes)
new_boxes = roi_helpers.non_max_suppression_fast1(bbox, overlapThresh=0.5)
for jk in range(new_boxes.shape[0]):
(x1, y1, x2, y2) = new_boxes[jk,:]
(real_x1, real_y1, real_x2, real_y2) = get_real_coordinates(ratio, x1, y1, x2, y2)
cv2.rectangle(img,(real_x1, real_y1), (real_x2, real_y2), (0,255,0),2)
############# same code as test_frcnn.py after this ``
Could you please guide me about how do I visualize the ROIs as you have pointed out, before giving them to the SVM classifier?
These are the region proposals I have obtained on a dataset of Blood Cell detection using your "train_rpn.py" code:
Hi @ayushmungad
Thanks for trying out this script and fixing bugs.
You might want to try non-maximum suppression (NMS), so that the over-lapping bounding box can be suppressed. Then, you can feed the NMSed outputs to your classifier.
It is used like this in train_frcnn.py:
R = roi_helpers.rpn_to_roi(P_rpn[0], P_rpn[1], C, K.image_dim_ordering(), use_regr=True, overlap_thresh=0.7, max_boxes=300)
What NMS does is explained here: https://www.quora.com/How-does-non-maximum-suppression-work-in-object-detection
You should read it to get the idea of what this part does. Here are the codes in the util script:
Please let me know if you have other problems!
Ken
#
def rpn_to_roi(rpn_layer, regr_layer, C, dim_ordering, use_regr=True, max_boxes=300,overlap_thresh=0.9):
regr_layer = regr_layer / C.std_scaling
anchor_sizes = C.anchor_box_scales
anchor_ratios = C.anchor_box_ratios
assert rpn_layer.shape[0] == 1
if dim_ordering == 'th':
(rows,cols) = rpn_layer.shape[2:]
elif dim_ordering == 'tf':
(rows, cols) = rpn_layer.shape[1:3]
curr_layer = 0
if dim_ordering == 'tf':
A = np.zeros((4, rpn_layer.shape[1], rpn_layer.shape[2], rpn_layer.shape[3]))
elif dim_ordering == 'th':
A = np.zeros((4, rpn_layer.shape[2], rpn_layer.shape[3], rpn_layer.shape[1]))
for anchor_size in anchor_sizes:
for anchor_ratio in anchor_ratios:
anchor_x = (anchor_size * anchor_ratio[0])/C.rpn_stride
anchor_y = (anchor_size * anchor_ratio[1])/C.rpn_stride
if dim_ordering == 'th':
regr = regr_layer[0, 4 * curr_layer:4 * curr_layer + 4, :, :]
else:
regr = regr_layer[0, :, :, 4 * curr_layer:4 * curr_layer + 4]
regr = np.transpose(regr, (2, 0, 1))
X, Y = np.meshgrid(np.arange(cols),np. arange(rows))
A[0, :, :, curr_layer] = X - anchor_x/2
A[1, :, :, curr_layer] = Y - anchor_y/2
A[2, :, :, curr_layer] = anchor_x
A[3, :, :, curr_layer] = anchor_y
if use_regr:
A[:, :, :, curr_layer] = apply_regr_np(A[:, :, :, curr_layer], regr)
A[2, :, :, curr_layer] = np.maximum(1, A[2, :, :, curr_layer])
A[3, :, :, curr_layer] = np.maximum(1, A[3, :, :, curr_layer])
A[2, :, :, curr_layer] += A[0, :, :, curr_layer]
A[3, :, :, curr_layer] += A[1, :, :, curr_layer]
A[0, :, :, curr_layer] = np.maximum(0, A[0, :, :, curr_layer])
A[1, :, :, curr_layer] = np.maximum(0, A[1, :, :, curr_layer])
A[2, :, :, curr_layer] = np.minimum(cols-1, A[2, :, :, curr_layer])
A[3, :, :, curr_layer] = np.minimum(rows-1, A[3, :, :, curr_layer])
curr_layer += 1
all_boxes = np.reshape(A.transpose((0, 3, 1,2)), (4, -1)).transpose((1, 0))
all_probs = rpn_layer.transpose((0, 3, 1, 2)).reshape((-1))
x1 = all_boxes[:, 0]
y1 = all_boxes[:, 1]
x2 = all_boxes[:, 2]
y2 = all_boxes[:, 3]
idxs = np.where((x1 - x2 >= 0) | (y1 - y2 >= 0))
all_boxes = np.delete(all_boxes, idxs, 0)
all_probs = np.delete(all_probs, idxs, 0)
result = non_max_suppression_fast(all_boxes, all_probs, overlap_thresh=overlap_thresh, max_boxes=max_boxes)[0]
return result
Hello @kentaroy47, Thank you for your timely response.
Actually, I have applied non-max suppression after obtaining the region proposals as in this line in my modified code:
new_boxes = roi_helpers.non_max_suppression_fast1(bbox, overlapThresh=0.5)
Here, I have modified the NMS function, as now after training only on the RPN part, we do not have the probabilities i.e P_cls and P_regr that we obtain from the classifier layer prediction output. Hence, I have changed the NMS code to sort the bounding boxes by the right-most co-ordinate instead of sorting by probabilities that is done in the original NMS code in the utils.
But, there is a drastic difference between the results obtained by the proposals obtained from just RPN layer and the results using both the layers.
So, is this approach of modifying NMS without taking into consideration the probabilities of bounding box classes as in P_cls and P_regr correct?
Here is the modified NMS code:
`def non_max_suppression_fast1(boxes, overlapThresh):
if len(boxes) == 0:
return [ ]
# if the bounding boxes integers, convert them to floats --
# this is important since we'll be doing a bunch of divisions
if boxes.dtype.kind == "i":
boxes = boxes.astype("float")
# initialize the list of picked indexes
pick = []
# grab the coordinates of the bounding boxes
x1 = boxes[:,0]
y1 = boxes[:,1]
x2 = boxes[:,2]
y2 = boxes[:,3]
# compute the area of the bounding boxes and sort the bounding
# boxes by the bottom-right y-coordinate of the bounding box
area = (x2 - x1 + 1) * (y2 - y1 + 1)
idxs = np.argsort(y2)
# keep looping while some indexes still remain in the indexes
# list
while len(idxs) > 0:
# grab the last index in the indexes list and add the
# index value to the list of picked indexes
last = len(idxs) - 1
i = idxs[last]
pick.append(i)
# find the largest (x, y) coordinates for the start of
# the bounding box and the smallest (x, y) coordinates
# for the end of the bounding box
xx1 = np.maximum(x1[i], x1[idxs[:last]])
yy1 = np.maximum(y1[i], y1[idxs[:last]])
xx2 = np.minimum(x2[i], x2[idxs[:last]])
yy2 = np.minimum(y2[i], y2[idxs[:last]])
# compute the width and height of the bounding box
w = np.maximum(0, xx2 - xx1 + 1)
h = np.maximum(0, yy2 - yy1 + 1)
# compute the ratio of overlap
overlap = (w * h) / area[idxs[:last]]
# delete all indexes from the index list that have
idxs = np.delete(idxs, np.concatenate(([last],
np.where(overlap > overlapThresh)[0])))
# return only the bounding boxes that were picked using the
# integer data type
return boxes[pick].astype("int")
` Here are the results of the region proposals obtained from only the RPN part(with max_boxes=20) and the RPN+classfier layer.
Left side= RPN+classifier output. Right side= Only RPN proposals.
Hi @ayushmungad. Great work done, and good luck on your projects :) Hope I can be your help..
So, is this approach of modifying NMS without taking into consideration the probabilities of bounding box classes as in P_cls and P_regr correct?
In RPN, both P_cls and P_regr can be obtained by running RPN prediction.
P_rpn = model_rpn.predict_on_batch(image)
P_cls should be considered in the NMS process, since it will try to pick the most confident bbox prediction (highest P_cls) and will remove the overlapping bindingboxes near it by calculating the IoU. You should include P_regr if you want precise bindingbox placements but can run without it.. (depends on your image resolution)
Hence, I have changed the NMS code to sort the bounding boxes by the right-most co-ordinate instead of sorting by probabilities that is done in the original NMS code in the utils.
sorry, I was not sure what you mean by this.. can you explain further?
@ayushmungad To be honest, if I were you I would consider Mask R-CNN, it feels like your objects are very similar yet cannot be easily detected as Bounding Boxes, maybe Mask would help? It's just an opinion :) have a look at this : https://github.com/matterport/Mask_RCNN and good luck!
@kentaroy47 Thank you for your reply, I have just checked it, would mind explaining the structure of F
and ROIs
? I've run a simple test to debug their shape and it is as follows along with my assumptions:
F
: (1, 32, 40, 1024),
I assume that 1024
is the number of features, 32
is the number of ROIs I take it and 1
is the batch size. But what is 40
?ROIs
: (1, 32, 4)
, 1
: Batch Size, 32
ROIs, 4
: (x, y, w, h) coords which should be transformed with get_real_coordinates
.Thanks!
Oh and to give you an idea about what I am trying do, I have trained two networks of Faster R-CNN on multispectral images, that is visible and thermal images (1 network for each modality) and I need to extract features of the same detected objects (IoU > 0.5) to classify them with SVM.
Hello, I am glad to see that a fork of this project is still alive! Cheers for that.
I have a question regarding the structure of the network, do you think it is possible to train a RPN only to predict regions (I do not need classification as I am going to perform it with SVM) where for each proposed region, I would also need its feature map.
PS. If you have any idea how to get the feature map of a particular region in the input image I would be grateful.
Thanks!