Open gtambi143 opened 5 years ago
I found this to be a problem for me as well pdftotext conversion currently hardcodes -layout as the conversion method, in some cases, I find -table to be better suited for the conversion... I think this also links with #108
Yes. Always useful to have it as option. Just keep the current setting as default.
The problem I see is that poppler's pdftotext
doesn't support -simple
or -table
. Support for those layouts was developed in Xpdf after it has been forked by the poppler project.
Today most Linux distributions switched to poppler
for providing pdftotext
and similar tools. Ideally support for those extra layouts should be ported from Xpdf to poppler but that seems like a huge task. I created request issue in poppler project to see if there is any idea/interest around that: https://gitlab.freedesktop.org/poppler/poppler/-/issues/1419
If i use normal pdftotext it has a option "-simple" using which i can convert a pdf assuming it as a single column pdf. But if I use invoice2data then it converts the pdf by assuming it to be multiple colums. Is there any option for this? Basically I am looking for below statements equivalent in invoice2data: pdftotext -simple file.pdf