Closed malinphy closed 1 year ago
Did you try with all the available layout options? The default should attempt to output the text in reading order. For example, this test PDF:
https://github.com/jalan/pdftotext/blob/master/tests/three_columns.pdf
results in this text output, which seems to be what you want:
column 1
one
column 2
two
column 3
three
Of course, if the software that created the PDF provides wrong information about the intended reading order, all bets are off.
No response, no example provided.
I believe pdftotext is very useful especially layout clearly displays the actual layout. However, academic pdfs have double columns. While scraping the double column pdf, pdftotext does not follow the order of the columns but brings the whole horizontal line from both column. Is there any way that pdftotext read the double column pdf in an ordered way? Thanks in advance.
Best Regards