Closed tanchangsheng closed 2 months ago
Not a bug. There is no indication whatsoever from which to conclude a paragraph break. So the logic behaves correctly in assuming continuous text.
Also that font size obviously fell in the range of header font sizes.
If you do not like that logic, either switch it off (hdr_info=False
) altogether or supply your own.
In any case: the 2 lines follow each other with no extraneous spacing, therefore no extra line break is being generated.
Thanks for the clarification!
Words across two lines parsed as single line.
file: example.pdf
Expected output:
Actual Output