melisabok / tabula-java

Extract tables from PDF files
MIT License
1 stars 2 forks source link

Guest option is guessing different areas #4

Closed melisabok closed 7 years ago

melisabok commented 7 years ago

TestCommandLineApp.testGuessOption

Collection of TextElement is the same in both implementations.

But the guessed rectangles have differences coordinates: is it related with Utils.pageConvertToImage?

New:

technology.tabula.Rectangle[x=68.0,y=287.0,w=458.0,h=463.0,bottom=750.000000,right=526.000000]

Old:

technology.tabula.Rectangle[x=95.0,y=297.0,w=404.0,h=95.0,bottom=392.000000,right=499.000000]
technology.tabula.Rectangle[x=95.0,y=425.0,w=404.0,h=175.0,bottom=600.000000,right=499.000000]
technology.tabula.Rectangle[x=70.0,y=632.0,w=429.0,h=116.0,bottom=748.000000,right=499.000000]

Without the call to removeText:

technology.tabula.Rectangle[x=95.0,y=297.0,w=404.0,h=95.0,bottom=392.000000,right=499.000000]
technology.tabula.Rectangle[x=95.0,y=425.0,w=404.0,h=175.0,bottom=600.000000,right=499.000000]
technology.tabula.Rectangle[x=95.0,y=633.0,w=404.0,h=115.0,bottom=748.000000,right=499.000000]