jorisschellekens / borb

borb is a library for reading, creating and manipulating PDF files in python.
https://borbpdf.com/
Other
3.4k stars 147 forks source link

Long table - how to write on multiple pages #7

Closed julien-fr closed 2 years ago

julien-fr commented 3 years ago

Hi,

I'm facing some difficulties to create a pdf with a long table. I get an assertion error assert height >= 0

I didn't succeed to find the information. What is the best way to manage multiple pages?

Thanks for the help.

jorisschellekens commented 3 years ago

Hi Julien,

Without seeing a stack-trace, or your code it is rather difficult to help. But this is my best guess as to what is happening in your code:

The workaround would be:

Automatically continuing content on the next page is not trivial:

I will (at some point) get around to implementing this. But the focus for the next release is better support for fonts.

Kind regards, Joris Schellekens

eitchtee commented 3 years ago

@jorisschellekens How would one go about testing if a Table fits into a page?

I couldn't find an easy way of calculating and comparing such sizes

jorisschellekens commented 3 years ago

Hi,

There is already some code that does this. And even a test that checks this behavior.

Whenever you add a LayoutElement to the Page that is too wide or tall, it should trigger an assert (with precisely that error message).

Kind regards, Joris

oldmanofthemountain commented 2 years ago

Hi,

I got curious and looked into this. Apparently, the code always fails with a negative rectangle height whether the table is too big for the page, or it just doesn't fit in the available space.

I tried a patch that detects when h goes negative, and returns a 'failure' rectangle. It's not elegant, but it seems to work in the tests that I have done for both cases. The code is:

fixed_column_width_table.py - line 122: if h < 0: return Rectangle( bounding_box.x, bounding_box.y + h, bounding_box.width, bounding_box.height - h )

I haven't looked at the flexible_column_width_table, so I don't know if the same thing works.

Regards,

jorisschellekens commented 2 years ago

Ideally, I want elements that can be split to inherit from the same (interface) SplittableLayoutElement.

As soon as the PageLayout receives an invalid Rectangle (triggering an assert) it should check whether the LayoutElement is SplittableLayoutElement and then it can try again.

Perhaps depending on a setting (maybe the user would prefer the Table not to be split, but take up its own full Page)

Kind regards, Joris Schellekens

oldmanofthemountain commented 2 years ago

That's interesting because that (SplittableElement) is exactly what I have done with paragraphs, except that I also have modified versions of paragraph and layout that cooperate as well as doing some other things for my particular intent. In my case, SplittableElement simply provides for an 'is_more' call that allows the paragraph to be split across as many columns/pages as necessary.

Since the table is aware of the available space, is it necessary, or better, to use an exception to trigger the split?

The above suggestion was simply to make the table act as expected, i.e. trigger the assertion if it is too big for the page, or switch to a new column if it doesn't fit the available space.

Working with borb has been an interesting experience.

Regards,

jorisschellekens commented 2 years ago

It's something I'm thinking about. Once I have a great solution in my mind, I'll work it out. That's why this ticket has been waiting around for a while. I want to find something that'll work in the most general of cases.

In the meantime, I want to play around with borb a bit. Fall in love with the fun stuff again. The new release is going to feature some more line art generators, and a way of adding gradients to line art. That should be fun :-)

Kind regards, Joris Schellekens

jorisschellekens commented 2 years ago

I've finally gotten around to implementing this. I need to make sure the code fits the standards of the library, but other than some renames and adding some typing, it should work.

I've also added a test to check this behaviour.

You should find a new PageLayout implementation in the next release, called SingleColumnLayoutWithSplitter (or similar) which breaks up Table elements when needed (and possible). I might add more logic later on to do something similar for Paragraph.

Kind regards, Joris Schellekens

Welgum commented 2 years ago

Hi @jorisschellekens ,

I wasn't able to find SingleColumnLayoutWithSplitter in the latest release. Have you added this feature?

Kind regards,

jorisschellekens commented 2 years ago

borb/pdf/canvas/layout/page_layout/single_column_layout_with_overflow.py

Welgum commented 2 years ago

Thank you for your quick response. I've tried the new layout, but still getting the same error when I try to add a long table

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/page_layout/single_column_layout_with_overflow.py:74, in SingleColumnLayoutWithOverflow.add(self, layout_element)
     71     return self.add(layout_element)
     73 # ask LayoutElement to fit
---> 74 lbox: Rectangle = layout_element.get_layout_box(
     75     Rectangle(
     76         self._horizontal_margin + layout_element.get_margin_left(),
     77         Decimal(0),
     78         self._column_width
     79         - layout_element.get_margin_right()
     80         - layout_element.get_margin_left(),
     81         available_height,
     82     )
     83 )
     84 if lbox.get_height() <= available_height:
     85     return super(SingleColumnLayout, self).add(layout_element)

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/layout_element.py:246, in LayoutElement.get_layout_box(self, available_space)
    222 cbox_available_space: Rectangle = Rectangle(
    223     available_space.get_x()
    224     + self._padding_left
   (...)
    242     ),
    243 )
    245 # determine content_box
--> 246 cbox: Rectangle = self._get_content_box(cbox_available_space)
    248 # take into account vertical_alignment
    249 delta_x: Decimal = Decimal(0)

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/table/flexible_column_width_table.py:270, in FlexibleColumnWidthTable._get_content_box(self, available_space)
    267     self.add(Paragraph(" ", respect_spaces_in_text=True))
    269 # return
--> 270 m = self._get_grid_coordinates(available_space)
    271 min_x: Decimal = m[0][0][0]
    272 max_x: Decimal = m[-1][-1][0]

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/table/flexible_column_width_table.py:241, in FlexibleColumnWidthTable._get_grid_coordinates(self, available_space)
    237     # layout
    238     h: Decimal = grid_y_to_page_y[r] - available_space.get_y()
    239     prev_row_lboxes.append(
    240         e.get_layout_box(
--> 241             Rectangle(
    242                 grid_x_to_page_x[grid_x],
    243                 available_space.get_y(),
    244                 grid_x_to_page_x[grid_x + e._col_span]
    245                 - grid_x_to_page_x[grid_x],
    246                 h,
    247             )
    248         )
    249     )
    251 # keep track of the bottom of the previous (at this point current) row
    252 # this makes it easier to lay out the next row
    253 new_y: Decimal = min([lbox.get_y() for lbox in prev_row_lboxes])

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/geometry/rectangle.py:26, in Rectangle.__init__(self, lower_left_x, lower_left_y, width, height)
     18 def __init__(
     19     self,
     20     lower_left_x: Decimal,
   (...)
     23     height: Decimal,
     24 ):
     25     assert width >= 0, "A Rectangle must have a non-negative width."
---> 26     assert height >= 0, "A Rectangle must have a non-negative height."
     27     self.x = lower_left_x
     28     self.y = lower_left_y

AssertionError: A Rectangle must have a non-negative height.
jorisschellekens commented 2 years ago

Can you give me the shortest possible example (code) of how it fails?

Welgum commented 2 years ago

Sure

borb version is 2.1.6

from borb.pdf import Document
from borb.pdf.page.page import Page
from borb.pdf import Paragraph
from borb.pdf import SingleColumnLayoutWithOverflow
from borb.pdf import FlexibleColumnWidthTable

document = Document()
page = Page()
layout = SingleColumnLayoutWithOverflow(page)

row_num = 84
col_num = 3
test_table = FlexibleColumnWidthTable(number_of_columns=col_num, number_of_rows=row_num)

for i in range(row_num):
    for j in range(col_num):
        test_table.add(Paragraph(f'{i}-{j}', font_size=10))

layout.add(test_table)
perennes commented 1 year ago

Hello, I would like to give feedback on this issue.

I have the same problem with a table which has a lot of rows and which must overflow on another page.

I used the "SingleColumnLayoutWithOverflow" class for the layout but I still get the same error: "AssertionError: A Rectangle must have a non-negative height.". I use Borb in its version 2.1.7.

I did a lot of testing with the following simplified code :

pdf = Document()
page = Page(PageSize.A4_LANDSCAPE.value[0], PageSize.A4_LANDSCAPE.value[1])
pdf.add_page(page)
layout = SingleColumnLayoutWithOverflow(page, horizontal_margin=Decimal(5), vertical_margin=Decimal(5))
table = FlexibleColumnWidthTable(
    number_of_columns=3,
    number_of_rows=72
)
layout.add(table)

Fun fact, with 36 rows, one row passes on the next page but from 37 I get the error "A Rectangle must have a non-negative height". I can't debug to find out what triggered this error.

I tried to calculate the size of my table in order to divide it on several pages but without success.

Do you have any idea where the problem might come from?

Your library is very easy to use, does the job very well and is perfectly documented. I thank you in advance.

jorisschellekens commented 1 year ago

Found it, and fixed it.

Table attempts to convert the grid coordinates to page coordinates in the method _get_grid_coordinates. This method (simplified) lays out the LayoutElement objects inside of the Table from top to bottom, left to right. It keeps adding the height of the last row to the y coordinate for the next row. And at some point, this logic starts using negative numbers for the height. And that's where it goes wrong.

I fixed it, by taking the max value of and 0. The PageLayout then concludes that the Table is too large for the Page and starts splitting it (expected behaviour).

Going to check whether this fix influences any other tests.

jorisschellekens commented 1 year ago

Turns out it does not mess with anything else. So you can expect this bugfix in the next release.

Kind regards, Joris

perennes commented 1 year ago

Thank you for the quick response. I look forward to using the next version of Borb.

Have a great holiday season !

Welgum commented 1 year ago

@jorisschellekens thank you for the latest release. I have just tested the version with this code snippet. And it still gives an error when the number of rows is large enough. Though the traceback is little different this time.

Sure

borb version is 2.1.6

from borb.pdf import Document
from borb.pdf.page.page import Page
from borb.pdf import Paragraph
from borb.pdf import SingleColumnLayoutWithOverflow
from borb.pdf import FlexibleColumnWidthTable

document = Document()
page = Page()
layout = SingleColumnLayoutWithOverflow(page)

row_num = 84
col_num = 3
test_table = FlexibleColumnWidthTable(number_of_columns=col_num, number_of_rows=row_num)

for i in range(row_num):
    for j in range(col_num):
        test_table.add(Paragraph(f'{i}-{j}', font_size=10))

layout.add(test_table)

Traceback:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [7], in <cell line: 19>()
     16     for j in range(col_num):
     17         test_table.add(Paragraph(f'{i}-{j}', font_size=10))
---> 19 layout.add(test_table)

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/page_layout/single_column_layout_with_overflow.py:89, in SingleColumnLayoutWithOverflow.add(self, layout_element)
     87 # split
     88 for t in self._split_table(layout_element, available_height):
---> 89     super(SingleColumnLayoutWithOverflow, self).add(t)
     91 # return
     92 return self

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/page_layout/multi_column_layout.py:161, in MultiColumnLayout.add(self, layout_element)
    159 assert self._page_height
    160 if available_height < 0:
--> 161     self.switch_to_next_column()
    162     return self.add(layout_element)
    164 # fmt: off

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/page_layout/multi_column_layout.py:104, in MultiColumnLayout.switch_to_next_column(self)
    102 self._current_column_index += Decimal(1)
    103 if self._current_column_index == self._number_of_columns:
--> 104     return self.switch_to_next_page()
    105 assert self._page_height
    106 self._previous_element = None

File ~/opt/anaconda3/lib/python3.9/site-packages/borb/pdf/canvas/layout/page_layout/multi_column_layout.py:119, in MultiColumnLayout.switch_to_next_page(self)
    117 # find Document
    118 doc = self.get_page().get_root()  # type: ignore[attr-defined]
--> 119 assert isinstance(doc, Document)
    121 # create new Page
    122 assert self._page_width

AssertionError: 
kw-nxvc commented 1 year ago

I have also been working with the page overflow for tables and paragraph objects with an assertion error.

I have created a page and layout:

page: Page = Page() 
layout: PageLayout = SingleColumnLayoutWithOverflow(page)

Then added paragraphs using the layout.add() method. Is there a better way to paragraphs? I also tried this by adding text to cells without borders to extend off the page getting the same error.

Pear-sudo commented 1 year ago

Hi, I also need to create a long table and have come across similar issues. (borb version: v2.1.16)

The code I tried to run:

doc: Document = Document()
page: Page = Page()
doc.add_page(page)
layout: PageLayout = SingleColumnLayoutWithOverflow(page)
rows = 33
columns = 3
table = FixedColumnWidthTable(
    number_of_columns=columns,
    number_of_rows=rows,
    column_widths=[Decimal(1), Decimal(1), Decimal(1)]
)
for i in range(rows * columns):
    table.add(Paragraph(" *" * 33, font="times-roman"))
layout.add(table)

The Error message:

File "~/main.py", line 461, in generate_pdf
    layout.add(table)
  File "~/venv/lib/python3.11/site-packages/borb/pdf/canvas/layout/page_layout/single_column_layout_with_overflow.py", line 202, in add
    super(SingleColumnLayoutWithOverflow, self).add(t)
  File "~/venv/lib/python3.11/site-packages/borb/pdf/canvas/layout/page_layout/multi_column_layout.py", line 205, in add
    assert False, f"{layout_element.__class__.__name__} is too tall to fit inside column / page. Needed {round(layout_box.get_height())} pts, only {round(available_box.get_height())} pts available."
AssertionError: FixedColumnWidthTable is too tall to fit inside column / page. Needed 702 pts, only 674 pts available.

Is there a workaround?

hemidark commented 1 year ago

I'm also experiencing precisely this issue with SingleColumnLayoutWithOverflow in 2.1.16, trying to create a long table. Layout takes a very long time and then fails when this assertion trips.

jorisschellekensbc commented 1 year ago

This is a duplicate of another (solved) bug. The fix of this bug will be in the next release.

https://github.com/jorisschellekens/borb/issues/171

And the (temporary) solution (new code for SingleColumnLayoutWithOverflow)

#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
This implementation of PageLayout adds left/right/top/bottom margins to a Page
and lays out the content on the Page as if there were was a single column to flow text, images, etc into.
Once this column is full, the next page is automatically created.
"""

import copy
import typing
from decimal import Decimal

from borb.pdf.canvas.geometry.rectangle import Rectangle
from borb.pdf.canvas.layout.layout_element import LayoutElement
from borb.pdf.canvas.layout.page_layout.multi_column_layout import SingleColumnLayout

class SingleColumnLayoutWithOverflow(SingleColumnLayout):
    """
    This implementation of PageLayout adds left/right/top/bottom margins to a Page
    and lays out the content on the Page as if there were was a single column to flow text, images, etc into.
    Once this column is full, the next page is automatically created.
    """

    #
    # CONSTRUCTOR
    #

    #
    # PRIVATE
    #

    @staticmethod
    def _prepare_table_for_relayout(layout_element: LayoutElement):
        from borb.pdf.canvas.layout.table.table import Table

        assert isinstance(layout_element, Table)
        layout_element._previous_layout_box = None
        layout_element._previous_paint_box = None

        # noinspection PyProtectedMember
        for tc in layout_element._content:
            tc._previous_layout_box = None
            tc._previous_paint_box = None
            tc._forced_layout_box = None
            tc.get_layout_element()._previous_layout_box = None
            tc.get_layout_element()._previous_paint_box = None

    def _split_blockflow(
        self, layout_element: LayoutElement, available_height: Decimal
    ) -> typing.List[LayoutElement]:
        from borb.pdf.canvas.layout.page_layout.block_flow import BlockFlow

        assert isinstance(layout_element, BlockFlow)
        return layout_element._content

    def _split_table(
        self, layout_element: LayoutElement, available_height: Decimal
    ) -> typing.List[LayoutElement]:
        from borb.pdf.canvas.layout.table.table import Table

        assert isinstance(layout_element, Table)

        # find out at which row we ought to split the Table
        top_y: typing.Optional[Decimal] = None
        best_row_for_split: typing.Optional[int] = None
        for i in range(0, layout_element.get_number_of_rows()):
            prev_layout_box: typing.Optional[Rectangle] = layout_element.get_cells_at_row(i)[0].get_previous_layout_box()
            if top_y is None or top_y < (prev_layout_box.get_y() + prev_layout_box.get_height()):
                top_y = prev_layout_box.get_y() + prev_layout_box.get_height()
            assert top_y is not None
            if any([x.get_row_span() != 1 for x in layout_element.get_cells_at_row(i)]):
                continue
            assert prev_layout_box is not None
            y: Decimal = prev_layout_box.get_y()
            h: Decimal = round(top_y - y, 2)
            if h < available_height:
                best_row_for_split = i

        # unable to split
        if best_row_for_split is None:
            assert False, (
                "%s is too tall to fit inside column / page."
                % layout_element.__class__.__name__
            )

        # first half of split
        t0 = copy.deepcopy(layout_element)
        t0._number_of_rows = best_row_for_split + 1
        t0._content = [
            x
            for x in t0._content
            if all([y[0] <= best_row_for_split for y in x.get_table_coordinates()])
        ]
        SingleColumnLayoutWithOverflow._prepare_table_for_relayout(t0)

        # second half of split
        t1 = copy.deepcopy(layout_element)
        t1._number_of_rows = (
            layout_element.get_number_of_rows() - best_row_for_split - 1
        )
        t1._content = [
            x
            for x in t1._content
            if all([y[0] > best_row_for_split for y in x.get_table_coordinates()])
        ]
        for tc in t1._content:
            tc._table_coordinates = [
                (y - best_row_for_split - 1, x) for y, x in tc.get_table_coordinates()
            ]
        SingleColumnLayoutWithOverflow._prepare_table_for_relayout(t1)

        # return
        return [t0, t1]

    #
    # PUBLIC
    #

    def add(self, layout_element: LayoutElement) -> "PageLayout":  # type: ignore [name-defined]
        """
        This method adds a `LayoutElement` to the current `Page`.
        """

        # anything that isn't a Table gets added as expected
        if layout_element.__class__.__name__ not in [
            "BlockFlow",
            "FlexibleColumnWidthTable",
            "FixedColumnWidthTable",
        ]:
            return super(SingleColumnLayout, self).add(layout_element)

        # get the dimensions of the Page
        page_width: typing.Optional[Decimal] = self._page.get_page_info().get_width()
        page_height: typing.Optional[Decimal] = self._page.get_page_info().get_height()
        assert page_width is not None
        assert page_height is not None

        # IF there is a previous LayoutElement,
        # THEN we use that to determine max y-coordinate
        max_y: Decimal = page_height - self._margin_top
        min_y: Decimal = self._margin_bottom
        if self._previous_layout_element is not None:
            max_y = self._previous_layout_element.get_previous_layout_box().get_y()
            max_y -= super()._calculate_leading_between(
                self._previous_layout_element, layout_element
            )
            max_y -= max(
                self._previous_layout_element.get_margin_bottom(),
                layout_element.get_margin_top(),
            )

        # calculate the height available for the LayoutElement
        available_height: Decimal = max_y - min_y

        # IF the available height is insufficient
        # THEN switch to a new column and try again
        if available_height < 0:
            self.switch_to_next_column()
            return self.add(layout_element)

        # calculate the available space (as a Rectangle) for the LayoutElement
        # fmt: off
        available_box: Rectangle = Rectangle(
            self._margin_left + sum(self._column_widths[0:self._active_column]) + sum(self._inter_column_margins[0:self._active_column]) + layout_element.get_margin_left(),
            min_y,
            self._column_widths[self._active_column] - layout_element.get_margin_right() - layout_element.get_margin_left(),
            available_height
        )
        # fmt: on

        # calculate the layout_box of the LayoutElement
        layout_box = layout_element.get_layout_box(available_box)

        # IF the layout_box is wider than the column
        # THEN raise an assert
        if round(layout_box.get_width(), 2) > round(
            self._column_widths[self._active_column], 2
        ):
            # fmt: off
            assert False, f"{layout_element.__class__.__name__} is too wide to fit inside column / page. Needed {round(layout_box.get_width())} pts, only {round(available_box.get_width())} pts available."
            # fmt: on

        # IF the layout_box fits inside the column
        # THEN delegate to super
        if round(layout_box.get_height(), 2) <= round(available_box.get_height(), 2):
            return super(SingleColumnLayout, self).add(layout_element)

        # IF the layout_box is taller than the column
        # THEN raise an assert
        else:
            if self._previous_layout_element is not None:
                self.switch_to_next_column()
                return self.add(layout_element)

            # IF the LayoutElement is a Table
            # THEN split the Table
            if layout_element.__class__.__name__ in [
                "FlexibleColumnWidthTable",
                "FixedColumnWidthTable",
            ]:
                for t in self._split_table(layout_element, available_height):
                    super(SingleColumnLayoutWithOverflow, self).add(t)
                return self

            # IF the LayoutElement is a BlockFlow
            # THEN split the BlockFlow
            if layout_element.__class__.__name__ in ["BlockFlow"]:
                for t in self._split_blockflow(layout_element, available_height):
                    super(SingleColumnLayoutWithOverflow, self).add(t)
                return self

            # the LayoutElement can not fit
            # fmt: off
            assert False, f"{layout_element.__class__.__name__} is too tall to fit inside column / page. Needed {round(layout_box.get_height())} pts, only {round(available_box.get_height())} pts available."
            # fmt: on

Kind regards, Joris Schellekens