Open kevinburke opened 2 years ago
Hi @kevinburke nice to see you here :)
This is almost certainly an issue in how pdfbox, the library Tabula uses to interact at a low-level with the PDF, handles PDFs generated in weird ways. The best fix is to re-encode the PDF with pdftk or Acrobat or a tool of your choice. That generally fixes things.
It could also be a subsetted-font, which is essentially a non-standard encoding. See this StackOverflow answer.
I'm using Tabula for Mac. We are trying to export the tables in the attached PDF. concord_housing_table.pdf
The initial upload generated a lot of overlapping selections. We removed all of them except for the selections that covered the entire table row.
When we go to export, the output looks like complete gibberish:
We're confused about this, because clearly it's meaningful gibberish - the number of gibberish characters corresponds to the text in the original file. Maybe we missed an encoding setting? We tried using the tools in the app but didn't see anything meaningful.