tabulapdf / tabula-java

Extract tables from PDF files
MIT License
1.77k stars 412 forks source link

Too low resolution leeds to lines disapearing in table #546

Open parleur opened 4 weeks ago

parleur commented 4 weeks ago

Hi, I encountered issues with pdf with thin lines disappearing when rendered by tabula web client. As a result, a vertical line disappears sometimes and two columns get merged. And it is quite random through several pdfs.

I wonder if it could be an aliasing problem when you render the pdf to png, so that increase the definition would solve it. Right now it is hard coded, and I understand increase dpi parameter, would increase the workload for the detection algorithm right after. Maybe should it be an option.

Sadly I cannot publish the faulty pdf ( bank account full of private informations ).

Thank you for your great tool!

https://github.com/tabulapdf/tabula-java/blob/8bfa3ad23af34f757f72fe46584a34abfc022ed3/src/main/java/technology/tabula/detectors/NurminenDetectionAlgorithm.java#L101 and also line 113