scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.37k stars 512 forks source link

Calculating table row height dynamically #480

Open lokesh1729 opened 5 years ago

lokesh1729 commented 5 years ago

Hi @scanny

We need to generate a report using a template. Some slides contain tables, where the data will be pulled from data sources and write it to the table cells, finally save the report with a new name. While writing to table cells, we need to write it dynamically, means if there is a table overflow, duplicate the slide, continue adding data in the next slide.

Consider the following scenario.

  1. Read a ppt template
  2. Read a table in a slide, say the row height of the cell we are on is x Pt.
  3. I replace some text on the cell, now the height of the row becomes x+y Pt.
  4. Now, when I find that row's height, I still get x Pt.

is there any way I can dynamically get the row size as I insert data in table cells ??? so that I can calculate table height and render the remaining table in the next slide.

Note: I am able to duplicate the slide, able to write to the same table in the next slide through a code hack from this library.

UPDATE: I have tried myself with following approach.

  1. Whenever I add some text to cell, calling this function with the presentation object
  2. Save it to temp file
  3. Read the slide and table with given id
  4. Read the row height and return it
def save_file_get_cell_height(presentation_obj, slide_id, table_shape_id, row_idx):
    """
    This function saves given presentation object and re-reads it, returns given table row height
    """
    new_ppt_name = "temp.pptx"
    presentation_obj.save(new_ppt_name)
    new_pr_obj = AbstractPresentation(new_ppt_name)
    slide_obj = new_pr_obj.presentation.slides[slide_id]
    table_obj = None
    for shape in slide_obj.shapes:
        if shape.shape_id == table_shape_id:
            table_obj = shape.table
            break
    assert table_obj, "Invalid table shape id passed"
    new_cell_height = AbstractPresentation.get_cell_height(table_obj, row_idx)
    os.remove(new_ppt_name)
    return new_cell_height

Then, I am surprised that it's still be the same🤔. Then, I opened PPT manually and resized the row height manually with touchpad, then executed above code, then found that it's giving new height.🧐

I dig more deeper,

  1. created a blank PPT in powerpoint.
  2. Added a simple table with few rows and entered some big data.
  3. unzip <ppt_file> then, opened slide.xml file, saw the height, but height of all rows are same.
  4. Then, manually resized row height.
  5. Again unzipped file, then found that height value has changed. 🧐

It's neither an issue from python-pptx side nor from microsoft, but I want to get cell height dynamically ??? How could I do that ???

Currently, I am thinking of approximating height based on parameters like font size, cell width, height and number of characters in the text etc... I feel that's not so accurate, so looking for better solution...

Thanks, Lokesh.

lokesh1729 commented 5 years ago

@scanny ????

scanny commented 5 years ago

There's no great solution to this problem. PowerPoint doesn't automatically resize table cells to fit content when the presentation loads (although LibreOffice may, might be worth trying).

The best you can do is estimate the text size and go from there (adding margins etc.). There weren't great options for that last time I looked, but that was a while back, maybe that situation has improved and you can find a library that can make good estimates based on font-metrics etc. That will be a certain amount platform dependent (OS X vs. Linux vs. Windows).

Lime91 commented 5 years ago

Hi @scanny

thank you for your very useful phyton-pptx package ;-)

I have the same problem as @lokesh1729 and I'm wondering whether I just discovered something?

PowerPoint doesn't automatically resize table cells to fit content when the presentation loads

Well, it doesn't resize the cells, but maybe it does resize the shape itself. Let's look at the following example: We create a simple table and fill it with text that is too long/large to fit in the generated cells:

from pptx import Presentation
from pptx.util import Cm

prs = Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[6])   # add a blank slide
shape = slide.shapes.add_table(1, 2, left= Cm(5), top= Cm(5), width= Cm(6), height= Cm(1))   # add a simple table with 1 row and 2 columns
tf = shape.table.cell(0,0).text_frame
tf.text = 'I LOVE CHICKPEAS'
prs.save('test.pptx')

After saving the result we open it with Powerpoint and inspect the table:

table

format_shape

We notice that Powerpoint somehow managed to calculate the correct shape height. If we now reload the presentation in python and print the shape's dimensions with

prs = Presentation('test.pptx')
for shape in prs.slides[0].shapes:
    if shape.has_table:
        print('row height:', shape.table.rows[0].height/360000)   # convert to Cm before printing
        print('shape height:', shape.height/360000)   # convert to Cm before printing

we obtain

row height: 1.0
shape height: 1.0

which is still the original height and not what Powerpoint displays us. However, if we manually save the open presentation without any changes (by clicking the "save" button in Powerpoint) and then execute the above code chunk again, we suddenly obtain

row height: 1.0
shape height: 2.54

So at least the shape height shows now the actual value we're interested in.

If Powerpoint is capable of recalculating the shape size, do you think there is a way we can do so as well?

scanny commented 5 years ago

No. This behavior is as described above, the key phrase being "when it loads". PowerPoint definitely has the ability to "fit" text to the shape and vice-versa, it just doesn't "reshape" those on a presentation when you load it. It will any time you enter edit mode if I remember correctly, and I suppose clicking Format Shape is about the same.

As far as calculating is concerned PowerPoint has access to the font-metrics and other layout data it uses and we don't. So that's kind of a short story.

Lime91 commented 5 years ago

Thanks for your prompt answer! Ok, then the only workaround is to mimic the fitting procedure by approximating some of the unknown font-metrics...

However, I just noticed that it isn't necessary to open the "Format Shape" menu in order to reproduce the result. Opening and saving the presentation in PowerPoint without clicking anything else suffices.

scanny commented 5 years ago

Hmm, interesting. So does opening the file with PowerPoint do the trick or do you need to open, save, and then open again?

Btw, there was some rudimentary text-fitting work done a bit earlier for regular shapes (not tables) you might want to try out: https://github.com/scanny/python-pptx/blob/master/pptx/text/layout.py The results aren't terrifically consistent, but better than nothing for the folks that sponsored that work. The main limitation, as we mentioned above was getting reliable font-metrics. The ones we got from Pillow don't seem to match those from PowerPoint very well.

Lime91 commented 5 years ago

The least thing I need to do to obtain the result is opening and saving the file with PowerPoint. Apparently, opening alone does not manipulate the file, even though PowerPoint already displays the table in the adapted form.

Thanks for the link! I'll have a closer look at it.

zhouxkgb commented 5 years ago

When reading ppt template, if you want to copy a slide, it will be added to the last page. How can you solve this problem?

stevo8 commented 3 years ago

I know this is an old item but I have found a solution that works very well. For my company we had a need to have a table match the formtting of a reference document but with different text in each cell. I therefore read the original document and calculate the table row heights as this is not a reliable value in the xml. It’s not reliable because PowerPoint will automatically increase row height to render the text with the formatting specified but doesn’t store that value in the XML. Usually the row height in the xml is just a default value but the real height is larger once the text is rendered. So after I have the calculated row heights I use them in my new table to resize the text until the row height matches the original. To do the calculation or row height you need to render the text as @scanny said. I found the most reliable way to do this is using QT with either PyQT or Pyside2. It can also be done with tkinter but is not as easy and is quite a bit slower. With PyQT/Pyside2 You can create a Qfontmetrics and get the bounding box for a given text. The key though is only the width should be used. For PowerPoint the height is a constant and it is 1.2x the font size. PowerPoint calls it the bound height. So all you need to know is the number of rows of text by wrapping the text to fit in the width and then calculate the total height. Obviously there is more to it as you have to account for formatting of runs, line spacing and merged cells. The code I have written works very well so it is possible. If there is interest I can share some of what I have with @scanny and try to incorporate it into the API. I’m not a programmer but the code does function.

Dasc3er commented 1 year ago

Hi all, I was experimenting on this some time ago and set up a draft for a function doing this operation using PIL. This is mostly untested, so it will probably need fixes to work correctly.

def estimate_text_box_size(
        txt,
        font,  # ImageFont
        max_width: Union[int, None] = None,
        line_spacing: int = 4
):
    """
    Example of use:
    right_margin = left_margin = Length(Cm(0.25)).pt * pt_per_pixel
    top_margin = bottom_margin = Length(Cm(0.13)).pt * pt_per_pixel
    width, height = estimate_text_box_size(
        txt,
        font,
        max_width=shape_width - (right_margin + left_margin),
    )

    print("Computed in pixels (w, h)")
    print((width + right_margin + left_margin, height + top_margin + bottom_margin))

    :param txt:
    :param font:
    :param max_width:
    :param line_spacing:
    :return:
    """

    from PIL import ImageDraw, Image

    image = Image.new(size=(400, 300), mode='RGB')
    draw = ImageDraw.Draw(image)
    emu_per_inch = 914400
    px_per_inch = 72.0
    pt_per_pixel = 0.75

    fontsize_pt = 12
    # font = ImageFont.truetype("arial.ttf", int(fontsize_pt / pt_per_pixel))
    import textwrap, math
    if max_width is not None:
        actual_txt = []
        for line in txt.split("\n"):
            _, _, width, h = font.getbbox(line)
            split_at = len(line) // math.ceil(width / max_width)
            actual_txt = actual_txt + textwrap.wrap(line, width=split_at)

        new_lines = len(actual_txt)
        actual_txt = "\n".join(
            actual_txt
        )
    else:
        actual_txt = txt
        new_lines = 0

    left, top, right, bottom = draw.multiline_textbbox(
        (0, 0), actual_txt, font=font, spacing=line_spacing
    )
    ascent, descent = font.getmetrics()

    return right - left, bottom  # + descent * new_lines

Reference GIST: https://gist.github.com/Dasc3er/2af5069afb728c39d54434cb28a1dbb8