scanny / python-pptx

Create Open XML PowerPoint documents in Python
MIT License
2.28k stars 503 forks source link

Replace keywords across multiple runs #836

Open SSMK-wq opened 1 year ago

SSMK-wq commented 1 year ago

I have two PPTs (File1.pptx and File2.pptx) in which I have the below 2 lines. I already posted this problem here. Since, there is no response, I thought of checking here

XX NOV 2021, Time: xx:xx – xx:xx hrs (90mins)
FY21/22 / FY22/23

I wish to replace like below

a) NOV 2021 as NOV 2022.

b) FY21/22 / FY22/23 as FY21/22 or FY22/23.

But the problem is my replacement works in File1.pptx but it doesn't work in File2.pptx.

When I printed the run text, I was able to see that they are represented differently in two slides.

def replace_text(replacements:dict,shapes:list):
    for shape in shapes:
        for match, replacement in replacements.items():
            if shape.has_text_frame:
                if (shape.text.find(match)) != -1:
                    text_frame = shape.text_frame
                    for paragraph in text_frame.paragraphs:
                        for run in paragraph.runs:
                            cur_text = run.text
                            print(cur_text)
                            print("---")
                            new_text = cur_text.replace(str(match), str(replacement))
                            run.text = new_text

In File1.pptx, the cur_text looks like below (for 1st keyword). So, my replace works (as it contains the keyword that I am looking for)

image

The same issue happens for 2nd keyword as well which is FY21/22 / FY22/23.

But there is no bold or any format applied to my lines in File1.pptx amd File2.pptx. So, not sure why does it have multiple runs.

Why is the same line (identical to human eyes) is being returned differently by run.text? What is the right level to look for text?

How can we combine the previous N runs to the current runs and do the replacement. where N can range from 1 to 3.

This issue happens for only 10% of the search terms (and not for all of my search terms) but scary to live with this issue because if the % increases, we may have to do a lot of manual work. How do we avoid this and code correctly?

How do we get/extract/find/identify the word that we are looking for across multiple runs (when they are indeed present) like CTRL+F and replace it with desired keyword?

Basically, I wish to know how do I make my code work like CTRL+H (Find and replace) that can take anyword and replace it with our keyword

Any help please? I am open to paid support to solve this issue as well

fschaeck commented 1 year ago

The answer to this question is - hopefully - my repository at https://github.com/fschaeck/python-pptx-text-replacer

The way Powerpoint distributes Text across runs in a paragraph is nearly random and can impossibly be anticipated. That has nothing to do with python-pptx, since that is only showing the crap PowerPoint is producing.

The script in my repository takes care of changing text across runs without effecting the character formats at all. And if it doesn’t for anybody, please open an issue there!