jorisschellekens / borb

borb is a library for reading, creating and manipulating PDF files in python.
https://borbpdf.com/
Other
3.4k stars 147 forks source link

BUG Paragraph hyphenation: split of the first word cause error #141

Closed ImAReplicant closed 2 years ago

ImAReplicant commented 2 years ago

Describe the bug A paragraph with hyphenation will cause error if the first word of paragraph needs to be split.

To Reproduce Create a script with hyphenated paragraph in a small rectangle.

from borb.pdf import Document
from borb.pdf import Page
from borb.pdf.canvas.layout.hyphenation.hyphenation import Hyphenation
from borb.pdf import Alignment
from borb.pdf import Paragraph
from borb.pdf import PDF
from borb.pdf.canvas.geometry.rectangle import Rectangle
from borb.pdf import HexColor
from borb.pdf.canvas.layout.annotation.square_annotation import SquareAnnotation

from decimal import Decimal

# create Document
doc: Document = Document()

# create Page
page: Page = Page()
# add Page to Document
doc.add_page(page)

r: Rectangle = Rectangle(
    Decimal(30),                # x: 0 + page_margin
    Decimal(742),    # y: page_height - page_margin - height_of_textbox
    Decimal(50),      # width: page_width - 2 * page_margin
    Decimal(70),               # height
)

page.add_annotation(SquareAnnotation(
    r,
    stroke_color=HexColor("#ff0000"),
)
                    )

Paragraph("Alignement",
          horizontal_alignment=Alignment.CENTERED,
          vertical_alignment=Alignment.MIDDLE,
          text_alignment=Alignment.CENTERED,
          hyphenation=Hyphenation("en-us")
          ).paint(page, r)

# store
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

cause this error:

if len(lines_of_text[-1]) > 0 and not self._respect_spaces_in_text:
IndexError: list index out of range

Expected behaviour The first word of parapraph needs to be split without error.

Modify https://github.com/jorisschellekens/borb/blob/9ac59b6f8bae6c8e3ba296e2aa91122d4792bcfd/borb/pdf/canvas/layout/text/paragraph.py#L144 to lines_of_text = [""] seems to fix the problem

Desktop (please complete the following information):

jorisschellekens commented 2 years ago

Fixed. You may expect this test (and of course the fix for this bug) in the next release.