tabulapdf / tabula-java

Extract tables from PDF files
MIT License
1.82k stars 425 forks source link

fixed removing non identical ruling from the tree #488

Open zanninso opened 2 years ago

zanninso commented 2 years ago

i found that the ruling find intersection method doesn't give me all the intersections, so after investigating the problem in found that is because of the comparator in the tree on the ruling, it compare just the top/y so that lead to remove any ruling have the same top/y.

so it's better to use the hashcode of the object as the key an the object it self as the value, this fixed the problem for me.

jeremybmerrill commented 2 years ago

I spoke with @zanninso via email. We discussed that we didn't originally intend for findIntersections() to be called without first calling collapseOrientedRulings, which should fix this problem (by guaranteeing that there are no overlapping lines with the same orientation).

I don't, however, recall why the TreeMap using Ruling.top() as the comparator, rather than the hashcode (as proposed in this PR). It seems either might work and the suggestion might be simpler. @jazzido do you recall?

azeddine-leet commented 1 year ago

bump

zanninso commented 1 year ago

hello The problem that we discussed before it just happened in the unexcepted way, even following the process and call "collapseOrientedRulings()" before calling "findIntersections()" , it's harder to get the case where it happens, but i will try to investigate when i get some free time and i will send it to you.