The current distance function computes the area between two textboxes. This can prioritize the grouping of textboxes A and B, while C is in between A and B. This is solved in the code by checking if there are textboxes between the to-be-grouped textboxes (the function isany).
I think the distance function (dist) can be improved so that it does not have to check for intermediate textboxes.
Ideas:
First order by distance in vertical direction, and then horizontal direction. This would be especially intuitive for converting to plain text, as it follows the reading direction more naturally.
The current distance function computes the area between two textboxes. This can prioritize the grouping of textboxes A and B, while C is in between A and B. This is solved in the code by checking if there are textboxes between the to-be-grouped textboxes (the function
isany
).I think the distance function (
dist
) can be improved so that it does not have to check for intermediate textboxes.Ideas: