scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.35k stars 510 forks source link

I wrote some quite frequently-used function, should these function be placed into this project #829

Open PaleNeutron opened 2 years ago

PaleNeutron commented 2 years ago

I think replace text or picture in an pptx template is a common use case and wrote function below and I think this project should provide them.

Please review these code.

from io import BytesIO, IOBase
from typing import BinaryIO, Literal, Union

import matplotlib.pyplot as plt
from pptx.presentation import Presentation as PrsCls
from pptx.shapes.picture import Picture
from pptx.slide import Slide

def replace_text(ppt: Union[PrsCls, Slide], search_pattern: str, repl: str) -> None:
    """search and replace text in PowerPoint while preserving formatting

    Args:
        ppt: Presentation / slide object
        search_pattern: search pattern
        repl: string to replace with

    """
    # Useful Links ;)
    # https://stackoverflow.com/questions/37924808/python-pptx-power-point-find-and-replace-text-ctrl-h
    # https://stackoverflow.com/questions/45247042/how-to-keep-original-text-formatting-of-text-with-python-powerpoint
    if isinstance(ppt, PrsCls):
        for slide in ppt.slides:
            _replace_text_in_slide(slide, search_pattern, repl)
    elif isinstance(ppt, Slide):
        _replace_text_in_slide(ppt, search_pattern, repl)

def _replace_text_in_slide(slide: Slide, search_pattern: str, repl: str) -> Slide:
    """replace text in one slide

    Args:
        slide: silde object
        search_pattern: search pattern
        repl: replacement string

    Returns:
        slide object
    """
    # TODO support regex
    for shape in slide.shapes:
        if shape.has_text_frame and (shape.text.find(search_pattern)) != -1:
            text_frame = shape.text_frame
            cur_text = text_frame.paragraphs[0].runs[0].text
            new_text = cur_text.replace(str(search_pattern), str(repl))
            text_frame.paragraphs[0].runs[0].text = new_text
    return slide

def replace_picture_in_slide(
    slide: Slide,
    fig: Union[bytes, str, plt.Figure, BinaryIO],
    pic_number: int = 0,
    auto_reshape: bool = True,
    order: Literal["t2b", "l2r"] = "t2b",
) -> None:
    """replace picture in one slide

    Args:
        slide: slide object
        fig: bytes, name, matplotlib.pyplot.figure, IO object
        pic_number: which picture to replace
        order: t2b means top to bottom, l2r means left to right, defalut is t2b
    """
    # get picture list
    _pics = [shape for shape in slide.shapes if isinstance(shape, Picture)]
    if order == "t2b":
        pictures = sorted(
            _pics,
            key=lambda x: x.top,  # type: ignore
        )
    elif order == "l2r":
        pictures = sorted(
            _pics,
            key=lambda x: x.left,  # type: ignore
        )
    else:
        raise ValueError("order must be t2b or l2r")

    shape = pictures[pic_number]

    # prepare figure
    if isinstance(fig, str) or isinstance(fig, IOBase):
        figio = fig
    elif isinstance(fig, bytes):
        figio = BytesIO(fig)
    elif isinstance(fig, plt.Figure):
        figio = BytesIO()
        if auto_reshape:
            fig.set_size_inches(shape.width.inches, shape.height.inches)
        fig.savefig(figio, format="png", bbox_inches="tight")
    else:
        raise ValueError(f"{type(fig)} {repr(fig)} is not supported")

    # replace picture
    new_shape = slide.shapes.add_picture(
        figio,
        shape.left,
        shape.top,
        shape.width,
        shape.height,
    )
    old_pic = shape._element
    new_pic = new_shape._element
    old_pic.addnext(new_pic)
    old_pic.getparent().remove(old_pic)
SSMK-wq commented 2 years ago

@PaleNeutron - I am new to this package and trying to do some replacements of keywords and came across your function. Can I know how to call your replace_text function? Is there any example that you can show?

I came here in search of a solution for a issue I am facing while replacing text given below

https://stackoverflow.com/questions/73201672/python-pptx-format-and-alignment-lost-for-specific-tables-and-their-headers

PaleNeutron commented 2 years ago

@SSMK-wq , I have write test for my code which may inspire you:

def test_replace_picture() -> None:
    # open pptx file
    prs = Presentation("tests/pptx/test_template.pptx")
    # replace all in ppt
    replace_text(prs, "{report}", "this is my report words")
    # replace in one slide
    slide = prs.slides[1]
    replace_text(slide, "{title}", "This is a title")
    replace_text(slide, "{subtitle1}", "small title")

    # generate fig
    fig_file = BytesIO()
    plt.plot([1, 2, 3, 4])
    fig = plt.gcf()
    fig.savefig(fig_file, format="png")
    fig_file.seek(0)

    # replace picture

    # replace the first picture in slide 0
    replace_picture_in_slide(prs.slides[0], fig_file, auto_reshape=True)

    # replace the first picture in slide 1
    plt.bar(list("qwertyuioplkjhgfdaszcvbnm"[:20]), range(10, -10, -1))
    fig = plt.gcf()
    replace_picture_in_slide(prs.slides[1], fig, auto_reshape=False, order="l2r")

    # replace the second picture in slide 1 with out auto_reshape
    replace_picture_in_slide(
        prs.slides[1], fig, pic_number=1, auto_reshape=True, order="l2r"
    )
SSMK-wq commented 2 years ago

@PaleNeutron - doesn't your "replace text" function replace table headers as well? I would like to replace text which are stored as column headers in a table in ppt.

PaleNeutron commented 2 years ago

@SSMK-wq , No, it dosen't. It just loop over all top level shapes, but table headers are nested in tables, you could read this https://python-pptx.readthedocs.io/en/latest/user/table.html

I think check shape by shape.has_table and then loop all cells in shape.table would work.

In my opinion, change text in header do not need "replace text" because you can get the exact position of a cell, you can do it just by:

shape.table.cell(0, 0).text = "some text"
SSMK-wq commented 2 years ago

@PaleNeutron - I already use has_table and cell option. May I kindly seek your help to have a look at this SO post1 and post2

I guess this will give you a better idea on what is the issue that am facing. Any suggestions from you is really helpful for me to solve this problem.

This issue happens only for certain tables (unfortunately, they are more in number) and not all tables