tnikolla / robot-grasp-detection

Detecting robot grasping positions with deep neural networks. The model is trained on Cornell Grasping Dataset. This is an implementation mainly based on the paper 'Real-Time Grasp Detection Using Convolutional Neural Networks' from Redmon and Angelova.
Apache License 2.0
232 stars 84 forks source link

Rectangular four vertex law #11

Open HEUzhouhanwen opened 6 years ago

HEUzhouhanwen commented 6 years ago

Hi! I thought for a long time on this issue, did not think clearly Rectangular four vertex law is not a law, such as clockwise, counterclockwise, or where the first vertex, these will not affect the results? This function: def bboxes_to_grasps(bboxes): box = tf.unstack(bboxes, axis=1) x = (box[0] + (box[4] - box[0])/2) 0.35 y = (box[1] + (box[5] - box[1])/2) 0.47 tan = (box[3] -box[1]) / (box[2] -box[0]) 0.47/0.35 h = tf.sqrt(tf.pow((box[2] -box[0])0.35, 2) + tf.pow((box[3] -box[1])0.47, 2)) w = tf.sqrt(tf.pow((box[6] -box[0])0.35, 2) + tf.pow((box[7] -box[1])*0.47, 2)) return x, y, tan, h, w Thank you!

tnikolla commented 6 years ago

Hi! Sorry, I did not understund the question.

xiaoshuguo750 commented 6 years ago

Does four vertex of pos Rectangular have a law,?such as clockwise, counterclockwise, I think these will affect the results of the function! function: def bboxes_to_grasps(bboxes): box = tf.unstack(bboxes, axis=1) x = (box[0] + (box[4] - box[0])/2) 0.35 y = (box[1] + (box[5] - box[1])/2) 0.47 tan = (box[3] -box[1]) / (box[2] -box[0]) 0.47/0.35 h = tf.sqrt(tf.pow((box[2] -box[0])0.35, 2) + tf.pow((box[3] -box[1])0.47, 2)) w = tf.sqrt(tf.pow((box[6] -box[0])0.35, 2) + tf.pow((box[7] -box[1])*0.47, 2)) return x, y, tan, h, w

xiaoshuguo750 commented 6 years ago

default

ahundt commented 6 years ago

I believe the answer to your question is yes, your choice for encoding of grasps will affect the results. Object detection papers are a good source for this kind of information and different object detection algorithms use different box encodings.

image

Here is object detection code with different bounding boxes: https://github.com/tensorflow/models/tree/master/research/object_detection

Here is the paper associated with the above link + image with details: https://arxiv.org/abs/1611.10012

One difference for grasp encodings is they have an extra rotation parameter.

ahundt commented 6 years ago

I read their question again, and I think they're asking if theta is clockwise or counter-clockwise. As per the actual dataset readme:

3. Grasping rectangle files contain 4 lines for each rectangle. Each line
contains the x and y coordinate of a vertex of that rectangle separated by a space. The first two coordinates of a rectangle define the line
representing the orientation of the gripper plate. Vertices are listed in
counter-clockwise order.

@tnikolla I'm fairly certain there are a couple problems in the code leading to worse performance than expected because it is only reading the first positive bounding box, and no other bounding boxes.

ahundt commented 6 years ago

@tnikolla can you explain the constants 0.35 and 0.47?

They appear all over the place, such as in bboxes to grasps, grasp to bbox, and in the iou calculation.

jmichaux commented 6 years ago

@xiaoshuguo750 @ahundt Have either of you determined the proper encoding of the grasps? Also, there's no differences between the two sets of equations in @xiaoshuguo750's picture.

Juzhan commented 5 years ago

I guess 0.35 and 0.47 are the scale factors of box width and box height, the size of image in cornell dataset is 640480, but the network's input size is 224244, after resize of the image, these bbox also need to be resize, so the scale factors are: 224/640.0=0.35, 224/480=0.47.

ahundt commented 5 years ago

This repository suffers from averaging all data, unfortunately. For example if there is a frisbee it will try to grab the center rather than the lid edge.

I've got improved code at https://github.com/jhu-lcsr/costar_plan which is good for classification of the cornell dataset, but a new cornell dataset training loop which gives credit on the smallest error grasp would be needed for regression to work well there.

Links to other recent papers are at https://github.com/ahundt/awesome-robotics/blob/master/papers.md.