Closed DrPlanecraft closed 10 months ago
Found the bug.
in SimpleLineOfTextExtraction
(line 78):
sorted(chunks_of_text, key=cmp_to_key(LeftToRightComparator.cmp))
should be
chunks_of_text = sorted(chunks_of_text, key=cmp_to_key(LeftToRightComparator.cmp))
Otherwise it's not really sorting the rendering instructions, which means it runs the risk of the x-coordinates being out of order. Which causes it to generate a Rectangle
with negative width
I'm fixing this (and a similar occurrence of sorted
) in the next release.
Kind regards, Joris Schellekens
Found the bug.
in
SimpleLineOfTextExtraction
(line 78):sorted(chunks_of_text, key=cmp_to_key(LeftToRightComparator.cmp))
should be
chunks_of_text = sorted(chunks_of_text, key=cmp_to_key(LeftToRightComparator.cmp))
Otherwise it's not really sorting the rendering instructions, which means it runs the risk of the x-coordinates being out of order. Which causes it to generate a
Rectangle
with negativewidth
I'm fixing this (and a similar occurrence of
sorted
) in the next release.Kind regards, Joris Schellekens
Thank You for the reply! I originally closed the issue as I found bugs inside the code I have provided
I am trying to load a PDF downloaded from arxiv
Expected behaviour I Expect it to exit with no issues, after printing out the differences between the LineOfTextExtraction and the ParagraphExtraction
Desktop (please complete the following information):