yrcong / RelTR

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2
248 stars 49 forks source link

dataset bbox #12

Closed jfliang3324 closed 1 year ago

jfliang3324 commented 1 year ago

Hi, Yuren: I want to know what is the format of annotation['bbox'] in train/val/test.json file in Visual Genome (in COCO-format) xyxy or xyhw? If it is xyxy, which two points in the bounding box are represented by xy xy? If it is xywh, which point does xy represent, and where is the (0,0) point of the picture? Thanks!

yrcong commented 1 year ago

Hi, it is a bit long... I have forgot the details. But you can have a look, if there are box samples like c<a [a,b,c,d], it is of course xyhw. For xyxy, the points should be top left and bottom right. For xywh, xy should be the center. I am busy with some deadlines so sry that i cannot check it for you now.

PJLallen commented 1 year ago

Hi Yuren, I have checked the processed json from OpenImage. Using through /data/process/py, the bbox may have some trouble as follows: bbox = [j[0], j[1], j[2]-j[0], j[3]-j[1]] #cxcywh Here, if you want to obtain cx, cy , w, h, the bbox should be treated as: j[0]+(j[2]-j[0])/2, j[1]+(j[3]-j[1]])/2, j[2]-j[0], j[3]-j[1]. because j[0] and j[1] are Xmin, Ymin respectively, instead of CenterX, CenterY. I don't know if such the GT json would have an impact on the training of the model, but it looks like you have successfully trained and tested it, and I'm wondering if I need to change this box format?

yrcong commented 1 year ago

I guess the comment might be wrong but the code should be correct. The format seems to be xyxy. During the evaluation, the function box_cxcywh_to_xyxy is used to convert cxcywh to xyxy.