Open sambitdash opened 6 years ago
Is there any way to currently do this?
Not really. You can manually estimate every textrun and see if they form a column. The specification does not provide any structural hints for the same.
On a related note, since by the nature of the format the output of pdPageExtractText
is not fully determined, it would be useful to:
@vargonis you can use pdPageEvalContent
and get the content tree. The content tree has all the bounding box information at a text run level.
This implementation may be needed to be reviewed along with #2. Although, there may not be an exact overlap in some cases the implementation logic can be similar.