Thank you for making this valuable project publicly accessible. I am trying to fine-tune the Dessurt on a receipt-like documents on the natural_q~ task. I would like to feed bounding boxes for each question and answer. However, I could not understand the format for bounding boxes. It looks like each bbox has 16 values by looking at the crop_transform.py. I understand the first 8 values repesent the coordinates for 4 corners. Can you explain what are the next 8 used for? Is it like one bbox with 8 values for question and one bbox with the next 8 values for answer? If not, can you also explain how am I supposed to feed bbox for question and answer separately?
These are the midpoints of each line of the bounding box (the same bbox as the previous corner points). They should be automatically derived from the annotations and are just there to help in the cropping.
Hi,
Thank you for making this valuable project publicly accessible. I am trying to fine-tune the Dessurt on a receipt-like documents on the natural_q~ task. I would like to feed bounding boxes for each question and answer. However, I could not understand the format for bounding boxes. It looks like each bbox has 16 values by looking at the crop_transform.py. I understand the first 8 values repesent the coordinates for 4 corners. Can you explain what are the next 8 used for? Is it like one bbox with 8 values for question and one bbox with the next 8 values for answer? If not, can you also explain how am I supposed to feed bbox for question and answer separately?
Thanks for your time and effort.