Open HEUzhouhanwen opened 6 years ago
Hi! Sorry, I did not understund the question.
Does four vertex of pos Rectangular have a law,?such as clockwise, counterclockwise, I think these will affect the results of the function! function: def bboxes_to_grasps(bboxes): box = tf.unstack(bboxes, axis=1) x = (box[0] + (box[4] - box[0])/2) 0.35 y = (box[1] + (box[5] - box[1])/2) 0.47 tan = (box[3] -box[1]) / (box[2] -box[0]) 0.47/0.35 h = tf.sqrt(tf.pow((box[2] -box[0])0.35, 2) + tf.pow((box[3] -box[1])0.47, 2)) w = tf.sqrt(tf.pow((box[6] -box[0])0.35, 2) + tf.pow((box[7] -box[1])*0.47, 2)) return x, y, tan, h, w
I believe the answer to your question is yes, your choice for encoding of grasps will affect the results. Object detection papers are a good source for this kind of information and different object detection algorithms use different box encodings.
Here is object detection code with different bounding boxes: https://github.com/tensorflow/models/tree/master/research/object_detection
Here is the paper associated with the above link + image with details: https://arxiv.org/abs/1611.10012
One difference for grasp encodings is they have an extra rotation parameter.
I read their question again, and I think they're asking if theta is clockwise or counter-clockwise. As per the actual dataset readme:
3. Grasping rectangle files contain 4 lines for each rectangle. Each line
contains the x and y coordinate of a vertex of that rectangle separated by a space. The first two coordinates of a rectangle define the line
representing the orientation of the gripper plate. Vertices are listed in
counter-clockwise order.
@tnikolla I'm fairly certain there are a couple problems in the code leading to worse performance than expected because it is only reading the first positive bounding box, and no other bounding boxes.
@tnikolla can you explain the constants 0.35 and 0.47?
They appear all over the place, such as in bboxes to grasps, grasp to bbox, and in the iou calculation.
@xiaoshuguo750 @ahundt Have either of you determined the proper encoding of the grasps? Also, there's no differences between the two sets of equations in @xiaoshuguo750's picture.
I guess 0.35 and 0.47 are the scale factors of box width and box height, the size of image in cornell dataset is 640480, but the network's input size is 224244, after resize of the image, these bbox also need to be resize, so the scale factors are: 224/640.0=0.35, 224/480=0.47.
This repository suffers from averaging all data, unfortunately. For example if there is a frisbee it will try to grab the center rather than the lid edge.
I've got improved code at https://github.com/jhu-lcsr/costar_plan which is good for classification of the cornell dataset, but a new cornell dataset training loop which gives credit on the smallest error grasp would be needed for regression to work well there.
Links to other recent papers are at https://github.com/ahundt/awesome-robotics/blob/master/papers.md.
Hi! I thought for a long time on this issue, did not think clearly Rectangular four vertex law is not a law, such as clockwise, counterclockwise, or where the first vertex, these will not affect the results? This function: def bboxes_to_grasps(bboxes): box = tf.unstack(bboxes, axis=1) x = (box[0] + (box[4] - box[0])/2) 0.35 y = (box[1] + (box[5] - box[1])/2) 0.47 tan = (box[3] -box[1]) / (box[2] -box[0]) 0.47/0.35 h = tf.sqrt(tf.pow((box[2] -box[0])0.35, 2) + tf.pow((box[3] -box[1])0.47, 2)) w = tf.sqrt(tf.pow((box[6] -box[0])0.35, 2) + tf.pow((box[7] -box[1])*0.47, 2)) return x, y, tan, h, w Thank you!