Mismatching between bbox coordinates and the image

hassan-mahmood / TIES_DataGeneration

Dataset Generation Code for: S.R. Qasim, H. Mahmood, and F. Shafait, Rethinking Table Parsing using Graph Neural Networks (2019)

MIT License

118 stars 39 forks source link

Mismatching between bbox coordinates and the image #2

Closed leonlulu closed 5 years ago

leonlulu commented 5 years ago

Hi, there. I am trying to generate the table annotations based on your codes. But I found there was an approximately 1.25 factor gap between the bounding box coordinates in data_arr and the output table images. (That is, only when I multiply the coordinates by 1.25 then draw the bboxes on the images, they match.) It's not a critical problem, but I just can't stop wondering what could cause this problem?

hassan-mahmood commented 5 years ago

There shouldn't be. There is a margin of 3 on both sides of a bounding box to visualize the bounding boxes. Without this margin, it will be a tight bounding box around each word. In either case, the coordinates will completely enclose each word. You can have a look at draw_matrix function in the code. Call draw_matrix function inside generate_tf_record function and pass im,arr and any one of the matrix (cellmatrix,colmatrix,rowmatrix) to visualize the bounding boxes.

hassan-mahmood commented 5 years ago

I have added code for visualization. You can compare your results (without 1.25 factor) and these visualizations.

leonlulu commented 5 years ago

Thank you very much for updating the visualization codes. I tried draw_matrices function and found the same problem. The mismatching of the bboxes coordinates can be traced back to html_to_img function. I'm not familiar with selenium so I can just guess it's caused by resolution settings in web driver on different computer or server.

leonlulu commented 5 years ago

Also, another question. Is there a way I can get the coordinates of each row and each col like the UNLV dataset annotations? Thanks a lot!

leonlulu commented 5 years ago

Change the monitor setting from 125% to 100%, then it's ok.

hassan-mahmood commented 5 years ago

Also, another question. Is there a way I can get the coordinates of each row and each col like the UNLV dataset annotations? Thanks a lot!

If you want to extract coordinates (min_x, min_y, max_x, max_y) that enclose one complete row:

Get all the word IDs that share a row from same_row_matrix.
Iterate through bounding boxes of all those IDs and find minimum x, y and maximum x,y of all. (You can vectorize this operation using numpy). Those 4 values will be the coordinates that enclose one complete row. You can follow same steps for columns by using same_col_matrix.

hassan-mahmood commented 5 years ago

Change the monitor setting from 125% to 100%, then it's ok. Can you please elaborate, what was the issue?